DOCUMENT RESUME 



ED 297 024 



TM Oil 999 



AUTHOR 
TITLE 



INSTITUTION 



SPONS AGENCY 

PUB DATE 
GRANT 
NOTE 
PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Phillips, Linda M. 

The Design and Development of the Phillips-Patterson 
Test of Inference Ability in Reading 
Comprehension. 

Illinois Univ., Urbana. Center for the Study of 
Reading.; Memorial Univ., St. John's (Newfoundland). 
Inst, for Educational Research and Development. 
Social Sciences and Humanities Research Council of 
Canada, Ottawa (Ontario) . 
Feb 88 
410-85-1321 
135p. 

Reports - Research/Technical (143) — 
Tests/Evaluation Instruments (160 ) 

MF01/PC06 Plus Postage. 

*Difficulty Level; Elementary Education; *Elementary 
School Students; Grade 6; Grade 7; Grade 8; Guessing 
(Tests); *Inferences; *Multiple Choice Tests; Reading 
Comprehension; *Reading Tests; Story Reading; *Test 
Construction ; Test Items; Test Wiseness 
Canadians; *Test Inference Ability in Reading 
Comprehension 



ABSTRACT 

The design and development of a 
ability in reading comprehension fcr grades 6, 7 
Phillips-Patterson Test of Inference Ability in 
Comprehension) are described. After development 
theoretical framework for the test of inference 
comprehension, the design, item development, and 
iterations of the test are outlined. The test wa 
students from schools in Alberta, Newfoundland, 
Scotia, and Ontario, Canada. Test score results 
province, sex, grade, and age group. Data analys 
on item difficulty, the relationship of item per 
performance and of story performance to overall 
wiseness, guessing, and Kuder-Richardson 20 Reli 
copy of the test is appended. (TJH) 



test of inference 
, and 8 (the 
Reading 

of a contamporary 
ability in reading 

test development 
s administered to 999 
Labrador , Nova 
were analyzed by 
is provided results 
formance to overall 
performance , test 
ability Indices. A 



************************************************* 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 
*********************************************************************** 



ERIC 



:>?^;-;^ 



■f HE;;iDESlSN;-.4|<fr ;DiVEllQBHgKr .OF THE P-HILUiPS-PAT.tER'SqN; 
TEST OF ;I|i|fpKCri|It#'^^ 



PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



olS , "'""anlv -eoresenr oH.cai 

UtHi Dosihon Of policy 



TO THE EDUCATIONAL RESOURCES 
, INFORMATION CENTER (ERIC). 



INStljyjE FORipueATtONAL^^^^^ 

: . STi |0)fM>s,rN^^^^ fes 3X8: 

. #05^7-3-7X8625. 



AND: 

: :.CENTER FOR. THt STUDY OF 'READINsV- 
UNiVERSlty OF ILLiNOIS: AT URB.ANA-.CHAHPM6N 



FEBRO'ARY I?88 



I Kjc 




THE DESIGN AND DEVELOPMENT OF THE PHILLIPS-PATTERSON 
TEST OF INFERENCE ABILITY IN READING COMPREHENSION 



LINDA M. PHILLIPS 
INSTITUTE FOR EDUCATIONAL RESEARCH AND DEVELOPMENT 
MEMORIAL UNIVERSITY OF NEWFOUNDLAND 
ST. JOHN'S, NEWFOUNDLAND AlB 3X8 
(709)737-86?5 

AND 

CENTER FOR THE STUDY OF READING 
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 



FEBRUARY 1988 



TABLE OF CONTENTS 



ACKNOWLEDGEMENTS 

V 

CHAPTER ONE 

INTRODUCTION 

1 

CHAPTER TWO 

THE NEED FOR AND CONCEPTUALIZATION OF A TEST OF INFERENCE ABILITY 4 
Defining Reading Cooiprehensi on oii-iii ... 4 

Inference in Reading Comprehension □ 
The Objectives of the Test .... 

9 

CHAPTER THREE 

A PRINCIPLE OF INFERENCE APPRAISAL 

Justification of the Principle ...... !? 

Suramary *2 

17 

CHAPTER FOUR 

SPECIFICATIONS OF THE PHILLIPS-PATTERSON TEST 

OF INFERENCE ABILITY IN READING COMPREHENSION 

Test Development Framework ^ 

Audience 

Kinds of Discourse , \q 

Topic Faailiarity \ 

Readabj'ity 

Test Fo-,iiat ^'^ 

Test Length V 

Item Developaent 

Selection of Topics 95 

Stage One: Graduate Students i ." ^6 

Stage Two: Teachers -76 

Stage Three; Students (Topic Identification) 07 
Stage Four: Students (Unassigned Written Essays) ... 17 
Stage Five: Students (Assigned Written Essays) " ' " og 
Stage Six: Final Topic Selection ' " " 

Principles of Story Writing ■ ■ . . . 

CHAPTER FIVE 

THE EVOLUTION OF THE PHILLIPS-PATTERSON 

TEST OF INFERENCE ABILITY IN READING COMPREHENSION 

Experimental Test Versions 

Preliainary Test Version ...... tt 

Pilot Study One (Short-answer Version) .' ." ." .' .' .' ." 

Trial One ■ 

Saaiple and Procedure . . -r. 

Results 

Test Revisions .... ^cr 

Trial Two yr 

Sample and Procedure .... t1 

Results 

Test Revisions .... 

..... 00 



33 



ERIC 



Pilot Study Two (Short-answer/Scal ed-answer 

Multiple-choice Versions) 

Sample and Procedure 

Results of Multiple-choice For/Rat ^7 
Results of Short-answer Format ....[..[ 
Test Revisions ] [ 

Pilot Study Three (Verbal Reports as Data) at 

Sa«ple and Procedure [ ] ?^ 

Scoring Responses 44 

Answer Set Revisions 11 

Vocabulary and Question Revisions ..!!!.'!!.' ' ' ' aa 

Story Passage Revisions ,p 

Pilot Study Four (Expert Sample) ... H 

Pilot Study Five 

Test Validation H 

Saftple and Procedure 

Coding ! ! ! ! 

Data Analysis 

Results and Discussion i ! 54 

Reading and Thinking Relationships for Items .' .' [ 54 
Reading and Thinking Relationships for Stories '.' ' * 55 
Reading Performance Relationships between Verbal "fieiort 

and Written Cohorts 

The Relationship of Item Performance 

to Story Performance ....... lq 

Reading Scores by Grade 

Interviewer Effect on Test Performance . as 

Summary .... 

^ 66 

CHAPTER SIX 

FINAL DATA COLLECTION: ANALYSIS AND RESULTS ,7 

Samples and Data Collection li 

Sample Demographics I 

Analysis and Results ! ! '. ' 

I";. ^""It^ Province! Sex/firadei and Age." 69 
Item Statistics ^ 

Item Difficulty - ^ ......... [ 

The Relationship of Item Performance to'overall'p^rformance" ] ]l 

PotenH!l " Performance to Overall Performance . . 77 
Potential Extraneous Influences 70 

Test-taking Strategies 7^ 

Test Wiseness [ 

Guessing ' 

Kuder-Richardson 20 Reliability Indices * .' 

00 

CHAPTER SEVEN 

SUMMARY OF PRESENT EFFORTS AND FUTURE PROSPECTS g4 

REFERENCES .... 

87 



11 



ERIC 



APPENDIX A: The Phillips-Patterson Test of Inferenr^ ^sn-. 

Comprehension ... "^s. ot inference Ability in Reading 

95 

APPENDIX^B: Reading Rating Scale for the Phillips-Patterson Test of Inference 

''AbiUtV.' «^^i;'9_*S"ie"f;r'the'pi:ilip;-Pa;t;r;o; ;e;t"o; Inierer'ce 

APPENDIX D: Directions to Teachers'. 
APPENDIX E: Key to TIA Answer Scale 

**'*•■ ■■■■•••125 

LIST OF TABLES 

i V 



iii U 



ERIC 



LIST QF TABLES 



Table 5-1: pilot 2, Item Statistics 



Thinking Scores by Story 
.ou.c ^-o: Pilot 5, Story Reading Means by Cohort 

""'ys-. story ■ ,urO., V.rb„ Bepcrt ,„d WrUter 
Table 6-1: Mean Scores for Provincp Qdv j a 

near., Variances, ,nd KR-20 Rel i .bil i t i .s for story „d lot.l 



Table 6-5: 
Table 6-6: 
Table 6-7: 



ACKNOWLEDGEMENTS 

The research reported in this monograph was supported by a grant from the 
Social Sciences and Humanities Research Council of Canada, Grant No. 410-85- 
1321. The views expressed are those of the author and not the granting 
agency. This «onograph was prepared while the author was on sabbatical leave 
as a Visiting Scholar at the Center for the Study of Reading, University of 
Illinois at Urbana-Champaign. The author would like to thank the Center for 
the help of its support staff and the use of its facilities. 

The cooperation of the students, teachers, and principals of the schools 
that participated in this project is appreciated. Mr. John Hart, 
Superintendent of the PI acenti a-St . Mary's School Board, and Mr. Hubert 
Furey, Superintendent of the Conception Bay Centre School Board, expressed 
interest in this project fron the outset and extended the utnost cooperation 
for the duration of the project. I thank the following people for 
participating in the final data collection and for the trenendous aoount of 
work in facilitating data collection in their province: Arthur Bull 
(Newfoundland), Ruth Eagan (New Brunswick), Frances Gatherall (Labrador), 
Gary Heck (Alberta), Brian McAndrews (Ontario), and Elizabeth Rice (Nova 
Scotia). I express ay gratitude to all who participated in the project for 
without their cooperation this research would not have been possible. 

I thank Cynthia Patterson, my research assistant, for her valuable 
contribution to the project. Special thanks go to Stephen P. Norris for 
providing insightful and constructive comments at different stages of this 



work. 



6 

V 

O 

ERIC 



CHAPTER ONE 
INTRODUCTION 

This report describes the des.gn and development of a test of inference 
abn.tv .n read:.g comprehension for grades si::, seven, and e.ght. T,e 
-...Pt.on th.t the abinty to ^aKe inferences is necessary to reading 
c-prehension w.dely accepted by reading theorists and researchers. Th.s 
Observation coupled with the fact that no satisfactory procedures e.ist for 
determining the e.tent to which children .ake good inferences when attempting 
to understand text motivated this project. 

D-:ng the last fiftsen years significant strides have been .ade in 
-ravelling the reading process. The assumption has been, and continues to 
be, that the .ore we understand how readers read and learn fro. te.t, the 
better able „e will be to teach the. to do so. One way to determine how well 
we have succeeded in teaching students is through tests «hich can help us 
measure our students' reading ability as well as the effectiveness of the 
instruction they receive. Unfortunately, .ost tests of reading comprehension 
yield at best superficial and vague information about reading ability. 

Over a decade ago J. jaap Tuin.an (1973-1774) remarked that five major 
standardized tests of reading comprehension were so constructed that users of 
them were left guessing about whether or not they «ere valid. It see.s that 
little may have changed in the testing of reading comprehension since that 
ti.e. In an extensive review of the literature on testing reading (Farr , 
Carey, 1986), standardized tests of reading were found problematic because it 
^•as not easy to make any decision as to what the tests measure. 

Based on the work of others and «y experience. I present the following 
three points to highlight some of the problems taken to be most relevant to 
the development of a test of inference ability in reading comprehension: 



1. Most tests of reading comprehension define reading comprehension 
broadly and vaguely, :f at all, and s:.ply test the component parts but 
ignore the integration of those parts; 

2. Most tests of reading comprehension test general knowledge, not. 
reading coniprehensi on j 

3. Most tests of reading comprehension are baaed on the assumption that 
when a reader selects the rJjAL answer (s)he has done so for the cjjM. 
reasons, 

A recent issue of The Reading Tearhpr. (April, 1987) is devoted to the 
state of reading assessment. The articles in the journal repeatedly condemn 
the way reading is assessed and claim that traditional tests do not assess 
what is currently known about the reading process (Calfee, 1987; Durkin, 
1987; Johnston, 1987; Valencia , Pearson, 1987; Mittrock, 1987; Wi.son, 
Peters, Weber, , Roeber, 1987). In sum, while it is generally agreed that 
significant strides have been made in reading theory, it seems evident that 
reading comprehension assessment is not in tune with advances in reading 
research and theory. Clearly, the time has co<ne to recognize the 
contribution which research can .ake to educational reform in the testing of 
reading comprehension. The research described in this report recognize, such 
contribution, 

T^ie plan of this report is built around four co/nponents. Chapters One, 
Two, and Three offer a contemporary theoretical fraaework for the test of 
inference ability in reading comprehension and set the foundation for 
subsequent chapters. I describe the design, item development, and test 
development iterations in Chapters Four and Five. The. discussion of the 
final data collection and results make up Chapter Six. The report concludes 



in Chapter Seven «sth a su.niary oi current research and a statement of future 
challenges for „ork on inference m reading comprehension. 

The followi.g chapter .ill discuss the need for and conceptual ::at.on oi a 
Ust 3f inference ability. Reading comprehension and inference will be 
defined. Without a working definition of what it is we wish to measure, 
there would be no sound guidance as to what to look for as indicators oi 
inference ability m reading comprehension. 



CHAPTER TWO 

THE NEED FOR AND CONCEPTUALIZATION OF A TEST OF INFERENCE ABILITY 
£v:denc. abc.ncs to suggest that ocor reasoning is prevalent :n cur 
students. The National Asse3i..ent cf Educational Progress (1994; -.^d 
la^gs decreases :n :nferen:..ng responses of 13 and 17 year olds ever a ten- 
/ear period. Furthermore, the Nation's Report Card (Appiebee, Langer K 
Mums, 1987. reports that only sn-all percentages oi students can rsascr 
effectively as the, read and write. Such findings suggest that students are 
not being guided to perfor. reasoning activities which require analysis and 
interpretation. 

These fir.dings are not surprising when coupled with research on teaching 
practices. In the late l?70's Dolores Durkin reported that schools do not 
teach comprehension, precious little tin,,- was devoted to having students 
explain and substantiate their interpretations. The authors of the Report of 
the Co^.ission on Reading (Anderson, Hiebert, Scott, & Wilkinson, 1985) 
lamented that there is very little direct comprehension instruction m «cst 
American classrooms. The increased evidence of poor reasoning has lead to 
claims about deficiencies in school programs and to appeals for action, such 
as more testing. Yet, most available tests are general and vague about the 
nature of reading comprehension and do not support instructional improvement. 
It would make sense that a test designed to specifically measure inference 
ability and to support instructional improvement would be an important place 
'Co start. 

In my study of the literature o.. testing for reading comprehension, and 
parti:clarly on testing for inference ability, standardized tests of reading 
nd p-oblemat:c because it is not easy to make any decision as to what 
measure. To highlight some of the problems, it is nstessary to 



recognize at the start that prominent researchers in the f.eld declare that 
reading assessment bears a nu.ber substantial flaws. To elaborate on a 
couple a pieces of research reported in Chapter One „o.ld serve to 
Illustrate some of the flaws, 

Tuin.an (1973-74) investigated five widely .sed standardized tests 
according to the e.tent to which questions on the tests could be answered 
without reading the passages upon which those questions were based. Two 
points emerged. First, although Tuin.an was cautious about his conclusions 
he found Significant reason to believe that it was general knowledge and not 
reading comprehension which was being measured. He suggested that .ore 
exploration was needed to discover the extent of this test invalidity. 
Second, these five «ajor tests did not provide any technical information on 
the extent to which items could be answered on the basis of information othe^ 
than that present in the passage. That is, the tests have failed to 

address this significant construct validity issue. 

r 

The question of validity is a major concern because the predominant 
approach to construct validity in standar.^i zed reading tests, that of 
correlations with other tests, has a significant weakness. It is based on 
rather circular reasoning because two tests «aybe intended tn measure the 
sa.e ability, be highly correlated, and still faU to measure what they were 

intended to aeasure. tHa^ ic 

That IS, they can possess what Embretson (Whi tely) 

(1983) has called "nomothetic soan" hll^ cHii t. u 

span , out still fail to be representative of 

the construct they are proposed to measure. 

Many of the problems «ith standardized reading tests remain unresolved and 
by continuing to use poorly-produced tests we are not recognizing the 
concerns raised, Anderson (1972) reported that educational researchers have 



ERIC 



.f. o 



not yet learned to develop achi eveaient tests that meet the primitive first 
requireaiert for a systen of -neaiur9:T!9nt , nasiely that there is a clear and 
consistent definition of the things being counted. A search for definitions 
reading ■: c-.prehensi op m the major established reading tests (many of 
which were developed prior to the !?70-s) reveals that for all intents and 
purposes none exist. Generally, each of the manuals says nothing more than 
that the test items measure specific skills to comprehend what is explicit m 
the material, to judge what is implied, and to draw inferences with reference 
to other situations. The tests do not identify which specific skills are 
being measured by particular items, nor do the tests report separate scores 
for specific sHlls. Rather, a composite score of comprehension, vaguely 
defined, is reported. 

Much work is required to give rise to orderly, sensible data in the 
evaluation of reading comprehension in general and inferenc-J ability in 
particular. It is my opinion that as a start much can be learned and applied 
to reading from researchers in the field of critical thinking, who have 
attempted to elaborate and clearly define the nature of inference (Govier, 
1985; Hitchcock, 1983; Norris, 1984; Salson, 1984; Scriven, 1976). In 
particular, I refer to the extensive work of Robert Ennis (1962, 19,'', i?8l, 
1985). His efforts in characterizing rational thinkers has been and 
continues to be an invaluable source of ideas for the project, as were 
available, .critical thinking tests (Ennis i Millman, 1985; Watson i Glaser, 
1980) and a test of induction currently under development (Norris & Ryan, 
1987) which include sections testing for inference-making ability. 

Defining Reading Comprehension 
As might be expected, reading comprehension has been defined in many nays. 



Articles and boc.s .iPIato, Farnha., ,9055 Huey, 1908,. Thor.d:..e, 2917; 
'^-.^ards, 1938; Gra,. 1940; Carroll. 1964,- Good.an. 1968; Anderson , .ea^son! 
!984:- save been written on and about reading comprehension. ...^ each 
article and boo. contributes to a ..ore thorough understanding of reading 
comprehension, each is incomplete. At ti.es. there are .is.atches a.ong the 
definitions of reading comprehension thereby suggesting that .ore is yet to 
be learned. It is beyond the province of this project to atte.pt to provide 
a complete and comprehensive definition of such a co.ple. act as reading 
ccp^ehension. For the purposes of this project. I preferred tore.ain 
tentative and prepared to alter .y .orUng definition of reading 
co.i!prphension with subsequent information. 

Reading comprehension is believed to be a collection of processes such as 
predicting, inferring, synthesizing, generalizing, and monitoring, which have 
been identified and labelled in various ways by different writers in the 
Held (Collins. Brown. . LarUn. 1980; Fagan, 1987; Henry. 1974; Smith, 
1971). It is widely accepted that reading comprehension involves more than 
knowing the correct pronunciation of the word,, knowing their individual 
meanings, and being able to locate information in printed material (Norris . 
PHillips. 1987; Phillips. Norris. 1987; Spiro, 1977; Tuinman. 1986). Current 
reading theory defines reading comprehension, more or less, as meaning 
constructed by a reader through strategic and principled integration of the 
textual information and background knowledge. 

Since an explanation of the intricacies of reading comprehension remains 
elusive, and since it is agreed that reading comprehension is a complex 

behaviour which continues to be perolexina fho„ 

perplexing, then one cannot set about 

assessing it in its entirety. Thus i^ «on«= f 

'^'■y. inus. It seems to make sense to study specific 



ERIC ^ 



7 



aspects of the process as a means ai seeking advancements in the assessment 
0^ ths complete process of reading comprehension. It xs with the process of 
Mfe-ence as an aspect of reading c=mprehens:on that I am .est concerned. 

Inference in Reader] r ^ffiorehen.;,- nn 
At a general level, inference is a cognitive process used to construct 
.T^eaning. Inferring in reading comprehension is a constructive thinkinc 
process, because a reader expands knowledge by proposing and evaluating 
.f'ypctheses about the meaning of te;<t. 

Good inference-making in reading comprehension requires the thoughtful use 
Of strategies (Collins. Brown , Larkin. 1980; Phillips, 1983; 1987; in press; 
vanDiik ..intsch. 1983) and evaluative criteria. Inferences in reading 
comprehension tend to be good to the extent that readers integrate relevant 
te«t information and background knowledge to construct complete 
-terpretations that are consistent with both the te.t information and 
background knowledge. 

At . specific Uvel, inference r.,uires inteUig.nt hu..n ;„d,e.e„t 
<En„i., 1,7„, .„d„ec.s.IUte. the „se o, reUv.„t te.t i„,.r».ti=„ .„d 
b.ckgr„„„d knowledge. This dependent „„ b.ck,r„.„d k„..Udge is i»p„rU„l 
f.r at least three reasons. First, a„ inference i„ reading co.prehensi on is 
the i.,teractio„ o, relevant i„,or.ation provided in the text and background 
l^ncwiedge. ,n other »ords, neither textual in.or.ation nor background 
l-ncledge alone is soHicient to .ak, good inferences. Second, background 
kno.led,e enables the generation o, alternative hypotheses in inferring. 
In.erence is the basis o, understanding .hich often involves transforming, 
extending, and relating infor.ation (Bark.an, 1,9,,. Third, .ithout 
background knowledge one cannot evaluate the strength of inferences to 

lb 



generalizations and explanations (GcYie'- I9SS) fk^. w 

f .L.w/ie. , ivs:}), thereby making bacl'ground 

l^r>owlodgo a necessary part of inferential reasoning. 

The Objectives of the Test 
Hav.ng :.pl.ed the complexity of the reading comprehension process in 
general, and having described the process of inference as ene aspect- 
comprehension particular, it is important to reiterate that comprehension 
IS a complicated cognitive process. Indeed, there may be considerable 
overlap and interdependence a.ong inferring and the other comprehension 
processes of attending, analyzing, associating, predicting, synthesizing, 
generalizing and monitoring. A general test of comprehension ability wo.ld 
^ocuson each of the processes .hereas Ue PU 1 1 • ps-P.tterson Test of 
Inference Atility in /?earf/.j Co.pr eker. s ion (TIA), (1987) focuses specifically 
on the process of inference-making. S.ch a concentrated focus allows for 
detailed information on a particular process. 

TIA is designed to appraise the inference ability of middle grade students 
on the basis of full length passages representative of the three kinds of 
disccurse commonly found at the middle grade levels and of topics 
Characteristic of classroom reading maU-rials. TIA is designed to inform 
t.achers about students' inference ability rather than to label students, and 
to provide diagnostic information for instructional decision-making purposes. 



CHSPTEB THREE 
fl PRINCIPLE OF INFERENCE fiPPRllISflL 
The ,e„er„ ,„,del,„e c, ab.I.t, test ,al,d.t.o„ that c.rected lh,e 
research .as that the test ,e „Hd to the e.teht that ,ce. ..,;erehce- 

.a»,„, leads to good per,or.a„ce eh the test ahd that poor ,„,ereoce-.a..„, 
leads to poor performance. To he in a pos.foh to disf^gd.sh good .h.erehoe- 
.alc.hg fro. poor ,h.ereh=e-.aH„g ,.pUes that there .„st be standards <or 
.aUn, s»ch di.tihctiohs. Reading educators should not he sat.sfied to accept 
Just any inference .erely hecause it reflects so.e le.el of the reader's 
cogn.t..,e co.petence. „hen .e judge someone's inference to be nor.atively 
good .e are co.parin, it to what .e take to be so.e standard of expert 
competence. So. it is important that the best interpretations are inferences 
in accord »ith the best available principles. To be in a position improve 
reasoning means to be in a position to distinguish good reasoning fro. bad. 
lo do so, i.pl.es that there must be principles and standards. 

T» apply a set of standards to the quality of i nf erence-.ak, ng in reading 
co.prehens.on certain assu.ptions about the reader, the task, and the te.t 

must be made in order to apt- nft i.u ^ 

to get off the ground. These presuppositions or 

necessary conditions are stated as follows: 
I. A reader aust: 

1. be competent with the difficulty level of the text; 

2. understand the deaands of the task; and 

3. intend to understand the text. 
11. A text iust: 

1. be written coherently; 

2. adhere to conventions of communication by being: 

a. as informative as is required for the situation; 



ERIC 



ERIC 



11 

b. accurate or complete with adequate evidence for 
asserted information; 

c. relevant to the ongoing situation; and 

d. unambiguous and clear. 

If these conditions are not met, poor performance on the inference task may 
be explained through failure to satisfy one. several, or all of these 
conditions, rather than as a lack of inference ability. 

The satisfaction of conditions I and II is necessary for the application 
of the following principle of inference appraisal to judgments of readers- 
inference ability in reading comprehension: 

Inferences in reading comprehension tend to be good to the extent that a 
reader integrates relevant text information and background knowledge to 
construct co.plete interpretations that are consistent with both the text 
inforfflation and background knowledge. 
By complete I mean that the interpretation explains all relevant information. 
By consistent I mean consistent with what is known to be true: the 
interpretation does not contain statements that are known to be false, or 
require as assumptions statements that are known to be false. In short, the 
interpretation is possible given what is known. Completeness and consistency 
are thus the two, criteria for judging interpretations. Neither criterion by 
itself is sufficient; they must be used in tandea. 

in order to deal with situations where there are competing 
interpretations, the criteria must also be used comparatively. We must ask 
Which interpretation is .ore complete, and .or. consistent, because often 
neither interpretation will be fully complete and fully consistent (Norris , 
Phillips, 1987; Phillips i Norris, 1987). Thus, the expression "tends to be 

J 



1 



1 



good to the extent that" is an important part the principle. The 

expression is a qualifier which signifies the limitations of the pri nci pi e 
and emphasizes that it is not an absolute principle. 

Justification of the Principle 
The work of researchers in four fields provides evidence for both the 
derivation and justification of the principle of inference appraisal 
presented in this research.: critical tlunk^ng (Ennis, 1969, 1981, 1985); 
philosophy and philosophy of science (Haraan, 1986; Thagard, 1978, 1982, 
1986); cognitive psychology (Holland, Holyoak, Nisbett, i Thagard, 1986; 
HcCloskey, 1983; Nisbett Sc Ross, 1980; Stich i Nisbett, 1980; and reading 
(Collins, Brown i Larkin, 1980; Markman, 1981; Mason, 1984; Norris i 
Phillips, 1987; Phillips & Norris, 1987). 

The aost extensive work done on inference criteria known to ae is that of 
Robert Ennis (1969, 1981). He uses the expression "flaterial inferences" and 
divides these into two categories: those which generalize the evidence which 
is offered, and those which derive their support from their power to explain 
the evidence. The latter category is most representative of the kinds of 
inferences made in reading comprehension and is thus the focus of this 
discussion. Ennis presents criteria for judging inferences to explanations. 
The inferences are justified to the extent that: 1. They explain a bulk and 
variety of reliable data; 2. They are themselves explained by a satisfactory 
system of knowledge; 3. They are not inconsistent with available evidence; 
4. Their competitors are. i nconsi stent with evidence; and 5. They are simpler 
than their competitors. 

The first criterion is covered in the principle of inference appraisal 
where it is stated that a reader integrates relevant text information and 

ERIC 2u 



background knonledge to construct interpretations that are complete, that is, 
interpretations that explain all relevant information. Ennis's second 
criterion (inferences are themselves explained by a satisfactory system of 
knowledge) and his third (inferences are not inconsistent with available 
evidence) are incorporated into the principle of inference appraisal where it 
says that interpretations are consistent with both the text information and 
background knowledge. That is, the interpretation does not contain statements 
that are known to b. false, or require as assumptions statements known to be 
false. Competing interpretations that are inconsistent with available 
evidence would be judged to be poor given the principle of inference as it is 
stated, thereby automatically incorporating Ennis's criterion 4 into the 
principle. Ennis's fifth criterion (inferences are justified to the extent 
that they are simpler than their competitors) is embedded in that part of the 
principle where it is stated that a reader integrates relevant text 
information and background knowledge. Irrelevant information can lead to a 
convoluted interpretation rather than a straightforward one based on relevant 
information. 

A second source of support for the principle derives from research on 
failures in everyday reasoning. According to Nisbett and Ross (,980) 
Shortcoming, in human inference-making reflect peoples' failure to use 
normative principles and instead to apply simplistic inferential strategies 
beyond their appropriate limits. They caution that human inference :s prone 
to several ^ajor sources of error including, to mention two, an over-rel :ance 
on just the information which happens to be available, and an inappropriate 
weighing of data relevance. Evidence of these two errors has particular 
bearing for a principle of inference appraisal in reading comprehension. In 



14 

the case of the first error, readers often place greater reliance on the text 
information. In the second cass, readers nay place too great a reliance on 
some of the textual information or on their background knowledge thereby 
failing to properly integrate relevant information fro« both. The point is 
that any principle of inference appraisal in reading comprehension must 
emphasize the necessity of using both relevant te/t information and 
background knowledge and of properly weighing the relevance of each. 

Nisbett and Ross (1980) also present evidence that more vivid or salient 
information is more likely to enter inferential processes than is less vivid 
information. Salient information may influence unduly a person's inference- 
making. Other research has illustrated the tendency for ideas, once 
formulated or adopted, to persist despite evidence which might be 
disconfirmatory (Holland, Holyoak, Nisbett, & Thagard, 1986; McCloskey, 
198.3). It seems people will point to scant positive evidence to sustain their 
original interpretation even though substantial negative evidence exists to 
suggest otherwise. Thus, when some people read and are faced with 
counfcerevidence, they will either tend to ignore or misconstrue the evidence 
to advantage (Phillips, 1987). It seems tnat a workable principle of 
inference appraisal must provide a standard against which readers can monitor 
whether their interpretations are the best explanations, that is, are 
consistent and complete, or are unduly influenced by one or all of the 
factors mentioned above, 

A third source of support for the principle of inference appraisal is 

garnered from work in the philosophy of science on inference to the best 

explanation. Inference to the best explanation consists in accepting a 

hypothesis on the grounds that it provides a better explanation of the 

Er|c 2^ 



5 



evidence than is provided by alternative hypotheses. Three important 
criteria are proposed by Thagard (1978) for determining the best 
explanation: consilience, simplicity, and analogy. An explanation is .ore 
c_onsmenl than another if it explains .ore of the evidence than the other by 
unifying and systematizing the information while at the san,e ti.e being 
informative. A simple, consilient explanation not only explains all that is 
necessary, but does so without making a host of assumptions with narrow 
application, merely derived for the .o«ent. The first two of Thagard 
criteria, consilience and simplicity, thus offer support for the standards of 
"completeness" and "consistency- defined in the principle of inference 
appraisal . 

Another source of support for the principle of inference appraisal rests 
Hithin the reading field. Ellen Markman (1981) in her work on comprehension 
monitoring acknowledged that distinguishing a good inference fro. a poor one 
is complex and closely tied to distinguishing better explanations or better 
theories. She posits the question of how readers decide whether or not they 
have understood. Markman shows how theories of comprehension inform theories 
of comprehension monitoring by describing two fundamental aspects of 
comprehension. She argue, that well organized cr tightly structured 
information i, essential to reading comprehension, that comprehension often 
potentiates the making of inferences, and that the two are interrelated. I 
propose the following points based on Markman's work on comprehension 
n-onitorings 1. good inferences are highly constrained by the context (text 
and background knowledge); 2. good inferences are based on warranted 
assumptions and are progressive in that they subsume previous ideas from the 
rontext; 3. good inferences are judgments .ronfirmed by subsequent information 



fro. the context; and 4. good inferences are judgments having elegance and 
parsimony within the context. The constraints imposed by context (text and 
background knowledge), in the four points above, are embedded in the 
principle of inference appraisal, thereby indicating that context both 
provides the subject .atter (relevant text information and background 
knowledge) as well as the parameters (... to construct complete 
interpretations that are consistent with both the text information and 
background knowledge) of the interpretation. Warranted assu«ptions( point 2), 
and inferences that have elegance and parsi.ony (point 4) are integrated into 
the principle of inference appraisal in reading comprehension through use of 
the words "complete- ar.d "consistent" as defined earlier in this chapter. 

A further elaboration and confirmation of the above four points is found 
in the work of Collins, Brown, and Larkin (1980) where adult subjects applied 
at least four different tests in evaluating the plausibility of the 
interpretations they constructed. The four tests include: 1. The plausibi- 
lity of the assumptions and consequences of the aodel (when a default 
assumption or a consequence of the interpretation seems implausible, then 
subjects tend to reject the interpretation); 2. The completeness of the model 
interpretations are evaluated in terms of ho« well the assumptions and 
consequences of the model answer answer all the different questions that 
arise); 3. The i nter connectedness of the model (the assumptions or 
consequences of an interpretation are weighted with respect to how they fit 
together with other aspects of the model); and 4. The match of the model to 
the text (occasionally, readers seem to weigh the model in terms of how well 
its assumptions or consequences match certain surface aspects of the text). 
Within Collins, Brown, and Larkin's model the integration of text information 



17 

and background knowledge in the construction of interpretations is explicitly 
stated as well as criteria used by adults to test the -fit" of the.r 
interpretations. 

Summary 

The principle of inference appraisal proposed is representative of what is 
currently kno«n about inference and provide, a framework within which to 
better understand the process of inf erence-.aking in reading comprehension. 
The principle is intended to be neither canonical nor conpr ehensi ve but 
rather to be an aivance toward a set of principles. The principle of 
inference appraisal .ust be considered tentative and alterable in the light 
of both further understanding and empirical evidence. However, as shown, :t 
IS supported by researchers in the critical thinking, philosophy 'and 
philosophy of science, cognitive psychology, and reading fields. There ,s a 
remarkable compatibility and overlap in the work. a3 can be seen by the 
notions of completeness, consistency, and clarity which all see as criteria 
of sound inferences. 



ERIC 



CHAPTER FOUR 
SPECIFICATIONS OF THE PHILLIPS-'ATTERSON 
TEST OF INFERENCE ABILITY IN READIN6 COMPREHEN.'^ lON 
It IS diHicult to separate the design and development of a test, however, 
since somewhat distinct decisions were made about each, I have opted 
devote a separate chapter to each. This chapter will p jvide the 
specifications on t^-^ design of TIA. 

Test Developaent Fratework 
Audi ence 

The intended audience for TIA is students in the aiddls grades. Students 
in grades six, seven, and eight were selected for both theoretical and 
practical reasons. 

Soae of the basic tenets of reading developaent guided ay theoretical 
decisions. While it is generally agreed that reading deveiopasnt is 
continuous, it is also agreed that there are stages of develop«ent. By the 
time students have advanced to the middle grades, they have read graded 
materials, content area subjects, ana have generally achieved some degree of 
independence in the reading process. These facts make it more «anageable to 
separate out inference ability problems, should they exist, from other 
problems such as vocabulary, syntax, or other failures (Anderson & Pearson, 
1984; Gentner, 1983; Vosniadou i Ortony, 1983). 

There is research which suggests that there are developmental diffarences 
in story comprehension (McConaughy, 1980). It seems that even grade five 
students focus more on the literal aspects than on the interpretative 
aspects. Another reason for selecting middle grade students is they have had 
some instruction in making inferences or in, what is generally referred to in 
basal reader manuals as, "reading between the lines". 



« .ore precticl reason resides in the fact that grades si,,, seven, and 
=i,ht are the levels „ith .hich I a» ,cst fa.i.iar in ter.s „, both teachin, 
experiences and other research projects. 

Kinds of Discoim^P 
' tyPi^Hy Classified as one of four types: ,i, exposition, 
<iil "-ration, ,iii, description, and U„ argo.ent ,Bock , Bre.er, 1,85, 
Bre.er, 1,B0, Spiro , Taylor, „B7,. E«Eositl^ ans.>,rs real or i.aginary 
questions. It present, facts or explains »h, so..thin, is i.portant, ho. 
so.ethin, .orks or .hat a thin, »eans. Sarratior, infor.s readers o. .hat is 
happenin,, it is an account o, events or action and includes characters, 
plot, th.,0, and style. Description is a discourse for. used to appeal to the 
senses o. the reader and is generally ahout the appearance „, an object, a 
person, or an event. i, , o, discourse in .hich there is an 

atle.pt to convince or persuade through appeals to reason, e.otions, or to 
both. Exposition, narration, description, and argu.ent often overlap so this 
Slobal Classification o.its .uch of the co.plexity of discourse. In practice, 
clear-cut classifications arc not al.ays possible. 

Three „, the four .inds of discourse are .ore fa.iliar to students in the 
.iddle grades, argu.ent is less fa.iliar. Thus, it .asdecided that attention 
to the three ,ore fa.iliar for.s .ould be sufficient on the TIA test to 
represent those discourse for., found at the .iddle grade levels. Exposition 
is the pri.arytype of discourse found in content area texts. Expository 
.aterials are usually .ritten using particular patterns of organization. The 
patterns .hich are co..onplace are enu.eration, ti., order, co.parison- 
contrast, cause-effect, and probl e«-sol utlon. Research sho.s that profic.ent 
readers can identify the .ain ideas and supporting infor.ation in expository 



20 

materials (Taylor, 1982j Heyer, Haring, Brandt, & Walker, 1980). Since 
expository materials .ake up a large percentage of what is read at the .iddle 
grade level, it was concluded that any test of inference ability in reading 
comprehension must contain an expository passage. 

Similarly, narration is a familiar discourse for« to middle grade studerts 
and for that reason is an important segment of TIA. Within literature, 

narrative pieces «,ay be a creation of the author's imagination or it may be 
about real-life situations. Fiction makes up one of the largest categories 
of children's and adolescents' literature and includes fantasies, science 
Miction, adventure stories, mystery and detective stories, stories about 
animals, historical fictiun, and stories which deal with personal and social 
issues. Narration is a discourse form commonly found in trade books and 
literature prograas. 

Description, while used in ordinary and technical char .erizations such 
as house advertisements or computer manuals, it is often us. in conjunction 
with the other discourse forms for effect. 

The three full-length stories in TIA represent the coa.on discourse forms 
found at the middle grade levels, thereby providing a more thorough appraisal 
Of students' inference ability across a variety of reading materials than 
tests which assess performance on either isolated passages and questions or 
on one discourse fora. Narrative, expository, and descriptive texts make 
distinct demands upon readers, readers' knowledge, and expectations about a 
task and have important consequences for cognitive processing and learning 
(Anderson i Armbruster, 19845 Brewer, 1980; Spiro Sc Taylor, 1987). For 
instance, narrative text is often argued to be easier to understand than 
expository text for both adults and childr^en (Bereiter , Scardamalia, 1982; 



ERIC 



Bock . Bre»=r, „es) possibly beca.se re^.ers are .ess .a.iUar »ith ... 
expository tex.s are organized. Since the three disccrse fcr.s are an 
integral part o, programs »hich students at the .iddle grade levels are 
-P=cted to learn, then dlHerences in co.prehensibi 1 1 ty het.een narrative 
d=..rlpti,e, and expository texts .„st be taken int. account ,or diagnostic 
purposes. 

Topic Fa giliarity 

'he role Played by background kno.ledge in reading co.prehension has 
attained such .idespread acceptance that it no longer reguires a 
iustiHcation. The prior or background kno.ledg. that a person brings to a 
t-t said to be on. o, tbe ,ost i.portant tact.r, in understanding, 

-e.bering, and interpreting text indorsation .Anderson, Spiro, . Anderson, 
>"8, Ausubel, 1,43, „ol.es, 1,83, Johnston, 1,84, Pearson, Hansen . Gordon, 
."'I. Further.ore, .hile topic ta.lllarity or possession o, re,„islte do.a.n 
^no-ledge does not necessarily guarantee interest. It does aHect tbe 
readability and co.prehension ot text (Phillips, 1,67, ,alker, 1,87,. Topic 
'a-Ularity is seen to be necessary, but not suHlcient, ,or co.prehension. 

Background kno.ledge alone is not suHlcient ,or reading co.prehension 
because a reader .ust kno. ^ to use tbat kno.ledge and to use that 

^-ledge. Tbls is a particularly relevant point in tbe appraisal o, 
inference ability because a reader .ust seek a co.plete Interpretation tbat 
.s consistent .ith both the text in,or.atlon and background kno.ledge .n 

order to made good ini^rnnrtit^ a 

9 inferences. Since readers ,ust integrate background 

^-.e-,e »ltb the text infor.ation to Infer, then to try to separate 

l-ackground kno.ledge fro. the text infor.ation is to deny the rol, of 

background kno.ledge in reading co.prehension. It is not clear .bat readers' 



performance on such a test would aean. Therefore, it is necessary to assess 
what prior knowledge students have in order to aake accurate appraisals of 
their inference ability in reading comprehension. 

Furthermore, since it is an objective of TIA to serve as a diagnostic 
tool, then it cust be realized that readers do not always integrate 
completely text information and background knowledge. Sonetiaes readers 
integrate only some of the relevant text information and background 
knowledge; other tiaes, readers will select relevant text information and 
background knowledge, but fail to integrate the two. There are occasions 
where readers fail to do any of the above and as a consequence fail to make 
an inference, go off course in their interpretation, or sake unwarranted 
assunpti ons. 

A fflultiplicity of approaches were undertaken during the developnent of TIA 
to establish accurate estiaates of middle grade readers' topic faailiarity. 
A more thorough discussion of the procedures will be presented in a 
subsequent section "Itea Development". At this point, it is sufficient to 
say that a group of graduate students were asked to list ten topics which 
they felt their students were interested in; one hundred and thirty teachers 
attending a workshop were asked to list ten topics that they felt their 
students ^ere interested in; three hundred aiddle grade students were asked 
to list ten topic3_ that they thought they could write about without 
difficulty; and twelve aiddle grade classes were selected to discuss soflie of 
the topics and to write about thea. Frofl these i nf ornatl on sources three 
topics were seen to be cociaon areas of interest and within the background 
knowledge of the intended audience of the TIA test. 



Readability 

The readability te.t is generally assumed to re^er to its legibility, 
ease of reading, and ease of understanding. Many readability formulae have 
been developed over the years, but they have not been without criticise. 
Traditional readability formulae have been criticized for having no point of 
reference (Manzo, 1970), for neglecting the importance of the structure, 
texture, and informational density of text (A«iran i Jones, 1982), and for 
lacking face validity (Coupland, 1978). 

Alternative ways of estimating readability have been proposed, including 
the subjective text difficulty approach by Ta«or (1981), the psycholinguistic 
approach by Holland (1981), and the conceptual approach by Rubin (1981). 
Ta«or combines text-based information (readability estimates) and 
performance-based information (recall scores) to co«e up with a subjective 
text difficulty level for individual readers. Hoi 1 and ' s p., -hoi i ngui stic 
alternative focuses on assessing the meaning-making demands placed upon 
readers by the language and structure of the text. Rubin's conceptual 
approach focuses on the concepts conveyed '.y the text: ho« arguments are 
presented, the role of examples in a text, and how characters' interactions 
are developed and described. 

I weighed and balanced the available evidence and decided not to use a 
readability formula in the traditional sense. Rather, I chose to use what 
«ay be described as a composite of both the traditional and alternate 
approaches to readability. That is, I chose to adhere to the principles of 
good story writing as well as to call upon the aad.enc, of «y test as the 
primary source for deciding upon readability. 

The TIA test was written on three topics identified to be familiar to 



middle grade students. In the preliminary pilot studies, students were asked 
to read the stories aloud and to point out areas of difficulty. When the 
areas identified to be problematic were revised, the text was Judged to be 
appropriate for the intended audience. In accord with Conditions I and II 
given in Chapter Three, it was important that the stories and inference 
questions be written coherently and adhere to the conventions of 
communication, and that a reader be competent with the difficulty level of 
the text, understand the demands of the task, and intend to understand the 
text. Otherwise, readers' poor performance on the inference task ,ay be 
accounted for by a failure to meet these conditions, rather than as a lack of 
inference ability. 

Test Forgiat 

ne PhilUps-P.iterson T.st of Inference Ability in Reading Co.prehens ion 
(See Appendix A for a copy) contains three full-length stories (average of 
465 words per story): a narration, an exposition, and a description. Stories 
consist of four to five paragraphs and twelve scaled-answer, multiple-choice 
questions. Questions follow each paragraph in the stories. Each question has 
four answers provided. To answer the questions students are to use 
information given in the story and information they already know. Students 
are given an exa.ple which is thoroughly worked through so that they will see 
that they are to consider all possible answers before deciding which answer 
they think is the "best" one. 

The challenge in changing reading assessment is to come up with new means 
to evaluate our current conceptualization of reading and to diagnose areas 
Where instruction is needed. Reading comprehension admits of degrees. 
However, credit has generally been given on most tests of reading 



25 



comprehension for one and only one correct answer. There has been no 
allowance for partially correct responses, that is, for evidence that a 
student .ay be capable of selecting -elevant information without quite 
knowing Hhat to do with it. 

The challenge in the design of TIA was to provide diagnostic information 
about students' performance and to use that information to support 
instructional improvement. To achieve this end, TIA represents a creative 
molding of the old and ne«. The old format of selecting an answer is there 
with the new advantage of giving credit for answers that are not completely 
correct. TIA may be described as a "scaled-answer multiple-choice" test, 
since it attempts to account for variations in understanding. The four 
answer choices represent a range in values (0 - 3) assigned according to the 
quality of the answer selected. An answer th«t is consistent with both the 
text information and background knowledge is worth 3 points; a partially- 
correct answer is worth 2 points; a text-based answer is worth I point; and 
an erroneous answer is worth 0 points (A complete copy of the scoring guide 
is provided in Appendix B). 

Test Length 

The current version of TIA may be administered in a class period (fifty' 
minutes). This allows time ^or giving instructions and the act. al test- 
taking time. It is intended to be a power test, so students are given a 
reasonable time to complete all items. Fro. teacher reports it appears that 
the average test-taking time is thirty minutes so this may be used .s a rough 
guide if teachers wished to use it as a speed test. 

Item Develop iitPnf 

Selection of Topicci 



^ > 
< '<J 



ERIC 



Differences in background knowledge a.ong students and graders can be 
manifested in different ways. These differences «ay lead to variance in 
performance on reading comprehension tests and hence to invalid 
interpretations of students' performance. It is desirable that the world 
views, or empirical beliefs needed to interpret a story on which test ite«s 
are based, be ones that «,ost students share. If scores on TIA are to be taken 
as measures of inference ability in reading comprehension, then it is 
necessary to reduce as «uch as possible the effects of background knowledge. 
To flini«ize differences in performance which might be due to differences in 
background knowledge rather than to differences in inference ability, items 
were selected on the basis of their familiarity to students at the grades 
six, seven, and eight levels. Sensitivity to this i.sue of background 
knowledge led to a comprehensive study of topics for potential selection for 
iteo. development. The six stages of the study are described next. 

Stage One: GradnatP St. u denti.. The first stage involved eight graduate 
students with a diversity of teaching experiences. Each student was asked to 
list ten topics which he or she felt students in grades six, seven, and eight 
would be interested in and have knowledge of, and to provide a justification 
for the Choices. These are the ten topics which the graduates identified .ost 
frequently: travel, space, videos, sports, animals, money, friends, future, 
Styles, and science fiction. 

stage Two: TparhPr.. . m the second stage, 130 middle grade teachers 
attending a professional development workshop were asked to list ten topics 
Which would interest their students and to justify their list. The teachers 
identified the folio. , topics .ost frequently: cars, space, money, science 
Hction, holidays, ^usic, .ystery, computers, sports, and hobbies. The 

3u 



27 



graduate students' and teachers' lists overlapped on the following topics: 
space, sports, money, future, and science fiction. 

Stags Three: Students (Topic friPnt i f i cat i on ) . m the third stage, the 
ideas of raiddle grade students „ere sought. Three hundred students at grades 
five, six, seven, eight, and nine were asked to list ten topics they thought 
they could write about without difficulty. Grades five and nine were taken 
in addition to grades six, seven, and eight to account for potential 
differences at the upper and lower limits of reading ability in the intended 
test group. What follows are the «ost preferred choices of the students: 
«oney. space things, sports, pets, getting out of school, holidays, «ovies, 
space friends, war, and travel. Overlap in topic choices a.ong all three 
groups indicates that the «o5t popular choices are .oney, space or space- 
related topics, sports, getting out of school, holidays, and pets. 

Sj aqe Four: Students (UnassionPrt Written m the fourth stage, 

middle grade students were asked to write on a topic of their choice. This 
Has done to distinguish topics which students would choose to write about 
fro« those that Bight sound exciting bu'. about which they would be unlikely 
or unable to write. Twelve classes of students in grades six, seven, and 
eight were asked to choose from the «ost co««,on topics identified up to this 
point (money, space or space-related things, sports, pets, getting out of 
school, holidays) or any other topic and to write an essay. 

The essays were generally about space, money, and pets in one way or 
another. Specific differences existed in the general topic, for instance, 
essays about pets varied from the time it takes to care for them to how pets 
are wonderful friends. Bearing in mind that each story on TIA was to be 
representative of the reading materials at the middle grade levels, then from 



ERIC 



the .o=t popular student topics three topics .ere selected. U,Os, „o„ey, and 
a Newspaper Mystery. 

Sta,e Five, Stnd.nt, ..ssiin.d » n F^My l. ,„ the ,i«h sta,e, sixty- 
five students in grade, live through „i„e .ere asked to .rite a story about 
UfOs, Money, and a «e.spaper «yster,. These essays »ere studied ,or 
vocabulary-choice, sentence and idea co.plexity, and (or«. 

• '"esi'th and.inal st„e „, topic 
selection .ent through three phases involving ,ree recall and .ord 
associations, recognition, structured and unstructured guestions and 
discussions on each o, the three topics. These phase, represent a synthesis 
of research on assessing background kno.ledge («da., !, Bruce, 1,80, Anderson, 
Spiro, , Anderson, 1,79, Hol.es, 1,83, Hol.e, , «o,er, 1,87, Pearson, Hansen, 
* Gordon, l,7,i Spilich. Vesonder, Chiesi, s Vo„, „7„ talker i Vekovich, 
■'84, and are taken to be see o, the best available .ays to assesi 
background kno.ledge. U took about five class periods to establish students' 
background knowledge of each topic. 

ir. Phase one, student, .ere asked to free recall or to brainstor. on each 
Of the topics. They .ere directed to think of all the infor.ation they .ould 
e«pect to ,ind in a story about UFOs, a story about None,, and a Newspaper 
Mystery. Also, students .ere asked to co.e up .ith associations ,or UFO- 
related .ords like heavenly bodies, evidence, and explanations, ,or Money- 
related words like uses, ,or.„ characteristics, and changes, and , or 
N-spaper-related .ords like responsibilities, carrier, .eather, and 
newspaper related confusions. 

The second phase involved recognition activities to identify any 
-^understandings which students .ight have about each of the three topics. 



ERIC 



These activities were developed iro. the phase one discussions. For exa.ple, 
it became evident that so«e students thought that scientists know what UFOs 
are and that UFOs are meteors and "stuff like that" in the sky. Students were 
asked to identify fro« a prepared sheet dealing with these batters several 
possible correct and incorrect answers to questions such as "What are UFOs?.- 
"Is a .eteor a UFO?" "Would there be UFOs if scientists know what they are?" 
The answers provided to these kinds of questions led into the final phase of 
topic selection. 

Discussions guided by structured and unstructured questions completed the 
final phase of establishing topic familiarity. A structured question on the 
Money topic, for instance, was "What is «oney?. Such questions led to 
unstructured questions about the topic such as "Do all jungle tribes have 
-ney?", and lively discussions were held with the students on each of the 
three selected topics. 

For the purposes of this project, students were assumed to have a 
sufficient amount of topic fa-iliarity if they were aole to speak to each 
topic according to a general outline as follows: 

UFO Outline 

I. What UFOs are believed to be 

II. What UFOs are reported to look like 

III. Where UFOs are reported to cone froa 

IV. Available evidence 

V. Why UFOs are studied 

Money Outline 

1. What aoney is 

n. The characteristics of money 



29 



ERIC 



30 

III. How aoney developed 

IV. Functions of fnoney 

V. Why the fora of money changes 

Newspaper Delivery 
1. Carrier's responsibilities 
U. Knowing the route 

III. The iaportance of tiae 

IV. How to deal with people 

V. Potential problems 

Furthermore, students were deeaed to have sufficient relevant background 
knoNledge if at least seventy percent of the. were able to speak to these 
outlines. This provides a rough, but adequate for testing purposes, control 
for differences in background knowledge for topic. The chances of systematic 
bias against any student across all tonics is ainiaal. 

The coaprehensive infor.ation gathered fro. the topics identified, the 
students' written stories, and class visits guided the choice of topics for 
t\B stories in the TIA test. Three stories were written for the TIA test: 
UFOs, Money, and The Wrong Newspapers. The UFOs story was modified from 
previous research projects (Beebe i Phillips, 1980; Phillips, 1985) and 
continues to be a winner a.ong students. It is a story about unusual 
pheno.e.a, telling of different UFO reports, offering plausible explanations 
for sofle of the reports, and suggesting that with improved technology we nay 
be able to explain UFOs. The Money story is a description of the everyday use 
of money, of how it works, as well as its historical development. The third 
and final story is a fflystery entitled "The Wrong Newspapers" which involves a 
mix-up in newspaper delivery, with tht? culprit being the neighbor's dog. 



ERIC 



3! 

Principle s Storv Writing 



Story gra«nar5 have been developed to illustrate underlying text 
structures. The aost co««on types of text used in the middle grade levels 
are narrative, descriptive, and expository. Each is organized in a 
part: lar .ay and it is believed that children use the structure, once they 
have it internalized, to assist the« in understanding and recalling 
information (Thorndyke, 1977; Stein, 1983; van Dijk & Kintsch. 1963). 

There is overlap in the classifications of narrative, expository, and 
descriptive structures since all three «ay be found in the one story. The 
«rong Newspaper story fits „ore within the narrative classification than 
either the expository or descriptive. However, the UFOs and Honey stories are 
harder to Classify because they overlap considerably the exposition and 
descriptive fornis. 

The principles story ,r,..,r were .diced in .ritin, the riA .ystery 
nerr.ti.e entitled -The »r=„, Ne.speper,-. The principle, .,y ,e sa,.ari:ed 

fcllcs: there should he e setting ,hich introduces the characters end 
provides the ti.e end piece o. the story, en initietin, event should occur 
"hich sets the story in ectlon, there should be e response to thet ectlon 
.oUo-ed hy en ette.pt to achieve e goel or to respond to en ection, the 
consequence, o, thet ette.pt .oven »lth e reection ere provided. These 
principle, coupled .ith e sensitivity to vocebulery choice, sentence 
structure, end sentence length »ere in our .ind, durin, the story Kr.tin, 
process, students' reedin, o, the stories „a, then used to «ake ,inal 
judgments on topic and story suitability. 

Narrative provides a resolution or stoppi ng pc ' n and therefor e i t i s 
easier to identify its underlying structure or gra„ar than is co««only the 



ERIC 



r* 



32 

case in expository aaterial. Expository caterial has six underlying 
structures: serial; topic; restriction and illustration; definition; 
arguaentation; and coflpar i son-con trast . The serial pattern «ay be considered 
the generic basic structure since the others are aore general secondary 
structures. These five structures (topic, restriction and illustration, 
definition, argumentation, and comparison-contrast) are oore perplexing than 
the serial structure fro« another perspective because within a serial pattern 
there aay be occasions when other structures are used. Consider, the case in 
a social studies text where foras of travel are being studied in 
predominantly a serial fashion, but for a couple of paragraphs modes of 
travel are coapared and contrasted followed at the end of the chapter with a 
generalization about the -ost efficient .eans of travel. So, it is common to 
see auch overlap aoong text structures. 

The TIA stories on UFOs and Money were written following primarily a 
serial pattern: a general concept is presented in each story; 
generalizations coabined with examples are stated; a sequence of events 
unfolds; a conclusion follows. Care was taken to ensure that vocabulary 
choices were either known or explained and that sentences nere coherent. The 
UFOs, Money, and The Wrong Newspapers stories were further subjected to the 
Anderson and Aribruster (1984) test of understandabi 1 i ty: do the stories 
provide enough relevant information to achieve the author's purpose and to be 
meaningful to its readers? The evidence from the groups of students in 
grades six, seven, and eight who read and discussed the stories is that the 
test of understandability was passed. 



ERIC 



CHAPTER FIVE 
THE EVOLUTION OF THE PHILLIPS-PATTERSON 
TEST OF INFERENCE ABILITY IN READING COMPREHENSION 
The present .or. of TIA (See Appendi. A) represents si. phases of 
evolution xn design and development. This chapter Hill provide details of the 
«od.f:.ations at each phase of the test development as well as a rationale 
^or the.. A description of the test validation techniques that guided the 
design and development of TIA will also be discussed. 

Experimental Test Versions 
Pr eliminary Test \inrcir,n 
The Preli»i„..y Test Version entwined three stories ,UFOs, «o„ey, The 
«ron, Ne.spepers) end forty-eight short-ens.er ,„estions. It .es given to e 
greduete reedin, ciess end e oolleeg.e .ho »es engaged in the develop.ent o, 
' test o( inductive reasoning in critical thinking. 

on the hasis o, .eedhack ,ro. these t»o sources the test »as edited and 
.ritten .ore concisely. The nu.ber o, test questions „as reduced ,ro. 48 to 
36 because the test .as too iengthy even ,or these subjects. This task „as 
--e si.p,e ,or t.o reasons: ,il S u, the t.elve questions .ere Judged to be 
insuHiciently related to the stories to ailo. ,or co.plete and cons.stent 
inferences to be ,ade, and ,li, 7 0, the t.elve questions could not be easily 
identified as inference questions, so in cases of doubt the questions .ere 
dropped frors the tesi. 

Pilot Study nn o (ShQrt-;.n^wer Verginni 

Trial One 

FoUn.ing the co.pUtion of the revisions to the Preli.inary Test Version 
'he first Pilot Of Tl. .as conducted. A short-ans.er for.at, rather than [ 
•ultiple-choice, .as used to help understand .hat the questions .easured. In 



addition, this trial administration was done to check on test length, 
passage difficulty, vocabulary choices, clarity of instructions, and 
question ambiguities. The effects of story order were studied. Research shows 
that narrative text is generally easier to understand than expository text. 
If this is so, then differences in performance could be expected if the 
order of presentation of discourse types «as altered. If differences were 
identified, then story order would be an iuportant consideration in 
subsequent test development. 

Sam ple and Procedure. Sixty-five students in grades six, seven, and eight 
participated. Test booklets were distributed randomly to students with the 
three stories (UfOs, Money, and The Wrong Newspapers) collated in the six 
story combinations. The directions and sample paragraph and inference 
question were discussed with the students. Students were told that they 
would have to use their background knowledge and the text information to 
answer the questions, that they would read three stories, and that each story 
would heve four or five paragraphs and questions on each paragraph. Students 
were directed to read each paragraph, to write their answer to each of the 
corresponding questions, and to justify their answers. When all student 
enquiries were answered, then the test was started. 

Results.. The pattern of student responses to the inference questions was 
one of the aost significant findings. Students' answers „ere of four types: 
an implausible response; a non-inference response; a partially-correct 
inference response; or a complete inference response (A .ore detailed 
description of these may be found in Appendix B). 

It was found that the TIA test was too long, since it took on average one 
and one-quarter hours to complete, after directions had been given and the 



35 

sample itea worked through. 

No dinerences in performance on the basis of story order .ere found. 
Students may have acquired new knowledge while taking the test, however, it 
did not add to nor detract from their performance when story order was 
al tered, 

'"^ ^^^^^'"n-^ - Fro« an examination of students' answers numerous 
revisions were made: (i, six questions were reworded to make their meaning 
-.ore clear and one question was deleted because of ambiguity; (ii) three 
questions in the bFOs story (#8, 9, 10) were re-sequenced as #10, 8, 9 to 
.atch the text sequence; (iii, sentences in the text judged to be too 
sin.ilarly worded to the corresponding inference questions were deleted; (iv) 
so<ae sentences .ere modified to be more general . and less explicit thereby 
making the corresponding inference questions more challenging; and (v) other 
sentences were changed to clarify meaning. 

upon completion of the revisions based on the results of trial one, the 

number of questions on TIA for the trial two study was 36. 12 for each story. 

Trial Two 

Trial two of Pilot Study One (short-answer version of TIA) was done to 
confirm whether or not the four types of responses identified in the student 
protocols of Trial One would be upheld. Whether the kinds of responses g:ven 
by the students in Trial One were an artifact of the test, and whether the 
revised version would yield similar results w.s a concern. If the four types 
of responses were upheld, then subsequent te,t development would have to take 
these response variations into account, if test performance was to be taken 
as a valid indication of student ability. 

Sample and ProrPdnrP, One hundred students in grades six, seven, and eight 



ERIC 



36 

took the short-ansxer version o^ thp TIA focf Tk« 

lift test. The same procedure Mas 

followed as in the previous trial. 

ResuU^. Students- written responses and accompanying justifications 
for their answers were studied. The trend of reader response variations 
identified in Trial One was evident in the responses on this trial. Student 
responses for each question fell within one of four response patterns 
Identified in Trial One. The four variations (an implausible response; a 
non-inference response; a partially-correct inference response; or a complete 
inference response) in student performance became a major factor in the 
future design and development of TIA. 

Test Revi sion^. Since a multiple-choice format for TIA was the ultimate 
ai«.. the fourth version of TIA involved writing a seal ed-ans«er multiple- 
choice set for use in the second pilot study. The questions on the modified 
version fron, Pilot Study One were changed to sentence ste«s, m order that 
the item form on the short-answer and multiple-choice versions of the test 
would be identical. For example, the question "Why do many people mistake 
heavenly bodies to be UFOs?- on the short-answer version became the sentence 
stem "Many people mistake heavenly bodies to be UFOs because. .." on the 
multiple-choice version. 

It was presumed that the sentence stem format might help to reduce writing 
time. 90 «inute3 on the short-answer version, to one class period. 
Distractors for the «ultiple-chcice version of the test were taken or molded 
fro«. students- answers on the written short-answer versions fro« Pilot Study 
One. Each set of four possible answers were scaled as follows: an implausible 
response worth 0, a non-inference response worth I, a partially-correct 
inference worth 2, and a complete inference worth 3. This "scaled-answer 




^or.af was used to aHord students the option to select the type of response 
they would likely .ake if they „ere taking the short-answe. version of TIA. 
Multiple-choice ite.s were constructed such that distractors were of 
consistent grammatical style and vocabulary, and of equal length. Keyed 
answers were randomly selected for position placement (A, B, C, or D). 
Pilot Study Two (Rhnrt-ansNer/S»1pH-.nc , er Mul ti m p-.Hnj „ ,,r.inn.^ 
The second pilot study was conducted to serve four purposes: (i)to 
examine the degree of similarity of performance on the short-answer and the 
scaled-answer multiple-choice formats; (ii, to compare completion times 
required by both test formats; (iii, to corroborate whether the four patterns 
of responses identified in Pilot Study One would be displayed by the students 
in this pilot; and (iv) to identify potential item ambiguities, vocabulary 
difficulties, and other problems. 
Sample and Procedure 

Eighty-one students in grades six, seven, and eight participated in Pilot 
Two. Fort, students wrote the short-answer versio., and the remainder wrote 
the multiple-Choice version. The same procedure described in previous pilots 
was followed, with one exception. The students taking the multiple-choice 
version were provided with answer choices. The students were cautioned to 
consider all possible answers before deciding the answer they thought was the 
best one. 

Re sults of MuItiple-choice Format 

Test completion time ranged from 50-75 minutes on the multiple-choice 
format for classes in grades six, seven, and eight. This represented an 
average reduction often minutes over the short-answer format, a reduction 
less than was expected. 



ERIC 



Iten, analysis oi the multiple-choice format showed a KR.-20 reliablity o^ 
0.68 and a test .c-an oi 17.5 items correct out ci total possible 36 iteMS, 
with a standard deviation oi 4.73. The test .eans for grades six, seven, and 
eight were 14.0, 17.1, ana 19.6 respectively. Ite«/test biserial correlations 
and item difficulty indexes were computed and are presented in Table 5-1. 

It can be seen fro. Table 5-1 that three of the thirty-six questions had 
negative biserial correlations (questions 18, 20, and 35) . Examination of 
these three items coupled nith students' short-ansner responses revealed the 
ans.er sets for questions 18 and 35 to be ambiguous. Question 20 required 
students to consider a historical perspective, but it appears that most 
students answered it from a current events perspective. 

Questions 8 and 28 had very low biserial correlations. It was clear, upon 
examination, that the problems with questions 8 and 28 were vocabulary- 
related. It seems that many students did not note the relevance of particular 
examples which were cited. For instan.e, "meteors" were cited as an example 
Of "astronomical events", but students did not see the relevance in answering 
item 8 Which read "Other kinds of astronomical events" that people mistake to 
be UFOs are". It seems students did not understand "astronomical events", so 
it Has replaced with "heavenly bodies". Revisions were made to all aspects of 
the test identified to be either definitely or potentially problematic. 

The item difficulty levels also pointed to problems with questions 18 and 
35 discussed in the preceding paragraph. Question 13 was among the more 
difficult items, it seems that a word in the question stem was interpreted 
differently by many students from the test authors. The question read "Money 
is needed in at least two different ways," students interpreted the question 
by focussing on the word "needed" as necessities. Th. test authors intended 



Table 5-1 



39 



Pilot 


2, Itesi Statisticc; 








Item 


Item/Total 


Item Difficulty 


1 1 em 


Item/Total 


Item Difficulty 




Correlations 


Index 




Correlations 


Index 


1 


.448 


.341 


1 o 
1 7 


.340 


.366 


2 


.528 


.610 




-.153 


.366 


3 


.667 


.317 


21 


.453 


.488 


4 


.378 


.293 


22 


.544 


.561 


5 


.477 


.512 


23 


.441 


.780 


6 


.710 


.512 


24 


.203 


. 171 


7 


.389 


.683 


25 


.306 


.732 


8 


.072 


.195 


*> L 
CO 


.795 


.902 


9 


.414 


.488 


2/ 


.412 


.512 


10 


.327 


.463 




. 101 


.537 


U 


.752 


.829 


9o 


. 221 


.488 


12 


.844 


.805 


OU 


.271 


.585 


13 


.311 


.098 


31 


.608 


.488 


14 


.217 


.561 


32 


.301 


.463 


15 


.217 


.561 


33 


.456 


.585 


16 


.206 


.244 


34 


.560 


.463 


17 


.573 


.293 


35 


-.316 


.098 


18 


-.053 


.073 


36 


.505 


.583 



"ith the ite. .,s .ith the »ordi„, and not »ith ho. the ,t„dents interpreted 
it. Ite. ,3 .as revised t. read , ^^^^ ^^^^.^^^ 

because". 

Results of Shnrt -ansHer Fnrm^f 

students' responses on the shcrt-an,»ar for.at .ere exa.ined and the type 
o< ans.er identiHed <i.p,a..ihle response, non-i n.erenoe response, 
PartUIIy-correct inference, oo.pUte inference,. „ain, the pattern o, 
student responses .as consistent .ith the t.o previous trials under fiiot 
One. This result .as taken as oo.pellin, evidence that a .alid test o, 
inference ahiUty .ould have to alio. ,or variations in student performance. 

Student responses on the short-ans.er for.at were co.p.red to the 
-Uiple-.hoice .ey to assess the agreement het.een the nu.her o, responses 
per ite. that received ,ull credit. The results are presented in Tahle 5-2. 
n should he pointed out that ,or purposes o, this analysis an ite. on the 
Short-answer .or.at .as not considered correct unless it expressed the sa.e 
»eanin, as the .eyed ans.er on the .ul ti ple-choi ce ,„r.at, conse,uer,t 1 y the 
percenta,es o, a,ree.ent .et.een the t.o are necessarily lowered, .or 
instance, consider ite. SO .hich says -.nn .anted to hand deliver ,r. .ones's 
ne.spaper hecause-. The .eyed response on the .ul tipl e-ch.ice ,or.at and the 
one re,uired on the .ritten short-ans.er ,or.at .ould he -to .ake sure he ,ot 

it and to talk to hia about the mvstprv •■ c„ , 

mystery. So, unless students responded on 

the s'nort-answer version with a compound answer, they „ere not scored -s 
completely correct even though they .ay have been partially correct. A lower 
percentage o. agreement was found on those ite^s that required students to 
synthesize story information. It seeded that if questions required students 



4u 



ERIC 



41 



Table 5-2 



eil ot 2, •/■ Agreement Between RMl Predit Shnr^-.n._ wer_MOMtiMlzcl^ 
Responses 



Item 



Item 



1 

1 


51 


19 


68 


2 


46 


20 


11 


3 


28 


21 


40 


M 

4 


32 


22 


33 


a 


50 


23 


75 


6 


39 


24 


54 


/ 


16 


25 


50 


e 


11 


26 


10 


9 


28 


27 


31 


10 


59 


28 


8 


u 


47 


29 


42 


12 


79 


30 


0 


13 


18 


31 


53 


14 


18 


32 


49 


15 


59 


33 


62 


16 


27 


34 


63 


17 


40 


35 


11 


18 


43 


36 


45 



ERIC 



to pull together „ore than one piece information to formulate a complete 
response, then they experienced difficulties or did not consider all 
available relevant intoraation. Question 26, for instance, required the 
synthesis of three pieces of information, however, the most co.mon response 
written on the short-answer test and selected on the multiple-choice test was 
a partially-correct one. 

The.ean on the 36 ite« short-answer test was 12.97. A recognized 
restriction of this pilot was that one and only one answer was dee.ed 
acceptable, which undoubtedly ignores a range of answers which may have been 
oartially correct. Bearing in mind this restriction on acceptable answers, 
then it seems reasonable to expect that the level of overaM performance may 
have been reduced. The mean on the muKiple-choic. test was 17. 15, which 
reflects a significantly higher level of performance. Another explanation for 
th.. lower performance on tho .hort-answer could be related to the fact that 
students had to construct and write an answer, «hich would see« to be a .ore 
de.ar.ding task than selecting an .nsner on the .ul tiple-choice test. Student 
performance on the ..1 tipl e-choice test may be a "better" indication of thexr 
reading ability than the short-answer test where performance is confounded 
Kith students' ability to express their ide.s in waiting. Also contributing, 
to lower scores was th. fact ,hat students tended to ■ ..3 r.ore items 
unanswered on the short-answer tesL than on the multiple-choice test. 
Test Revisions 

Passage, question, anJ answer modifications were «ade to TIA prior to the 
next pilot. Revisions were «ade to each of the three stories. For instance, 
on the UFOs story, it was found that students failed to attend to the word 
"not" in the sentence, "Many of the older reports are not complete so we need 



to continue to study UFOs", consequently leading to an erroneous response. 
The sentence „as «odi^ied to "Hany of the older reports are incomplete so „e 
need tw CGiitxriu^ CO study LiFOs'', 

So«.e questions were r-eplaced because they did not require students to make 
inferences, and son,e answers to other questions were replaced because of 
an.biguity. Other revisions included substitutions in Nord-choices and changes 
in information placement. Further revisions included aaking answer selections 
.ore parallel with one another. For instance, "plastic cards" was replaced 
with "club cards" in iten. 23 to .ake it more parallel with the other options: 
"trade items", "credit cards", "chocolate bars". 

Pilot Study Three (Verbal Reports 
Pilot Study Three .as conducted using a verbal report methodology. Verbal 
reports „ere used as a method to validate whether a complete inference had 
been made „hen students selected the keyed answer to help ensure that 
«alf.:ple-choice test questions were functioning effectively as inference 
questions. In addition, such an approach is particularly useful in test 
development for revealing potential item ambiguities, vocabulary problems, 
ood hidden cues. 

Care was taken to develop interview procedures which would not jeopardize 
the quality of information to He collected and conclusions to be drawn. Two 
trial verbal report sessions with six students each were held to ensure that 
the two interviewers understood the derands and limits of the approach, as 
Hell as to determine wb-thP. th. information needed from the students was 
being acquired. 
Sample and Procedure 

Thirty-six students in grades six, seven, and eight participated. Students 



44 

were each assigned to one of the three stories on TIA. They were told that to 
ansHer each question they would have to use information given in the story 
and information they already knew. They were told that the story would not 
directly answer the questions and they would have to use their common sense 
and the story information. Students were advised to consider all possible 
answers before deciding which answer they thought was the best one. A sample 
item was worked through with the.. Once the sample item was completed and 
students- questions were answered, students were asked to read aloud each 
paragraph, to read the corresponding test questions, to select one of the 
four answers provided, and to tell why they thought that answer Mas the best. 

Interviewers questioned students only if there was a lack of clarity in a 
response, such as an unspecified pronoun antecedent or an answer so terse or 
vague that it was too incomplete to follow. At the end of the test interview, 
general questions were asked about students' interest in the story, about 
whether the passage vocabulary gave the« any problems, and about whether 
there were other things that were unclear to the«. Each verbal report 
protocol was transcribed and a scheme developed to code the qual i ty of 
students ' responses. 
Scor i nq Responses 

In order to reflect the range of responses shown by students in the verbal 
reports, a scoring syste. W2= devised to allow credit for partially correct 
as well as complete inferences. Scores front 0 to 3 were assigned on the basis 
of the range of completeness of the student responses. See Appendix B for 
the criteria for grading th.T >cst of inference ability. 

The following question m) and its possible answers (A, B, C, or D) 
illustrates the scoring system. 




45 

Ql UFOs are soaetiaes called other naaes because 
(A) people naae thea according to their shape or probable origin. 
This ansner is a co«plete inference an." therefore, is given a score of 
(3). The relevant textual information was contained in sentence three, 
"People sometimes call UFOs flying saucers, spaceships fro. other planets, 
and extraterrestrial spacecraft". Using background knowledge it can be 
concluded that the naming criteria for UFOs in this story are based on either 
shape ("saucers") or probable origin ("other planets" and "extraterrestrial" 
). The integration of the relevant textual information and background 
knowledge o,akes (A) the best inference response for question 1. 
(D) people don't knot. Nhat to call thea so naae thea by shape. 
This answer is given a score of (2). It is a partially correct inference 
for question 1 because it only considers one of the naming criteria, shape. 
When the textual information supplies two criteria. The criterion of shape 
was selected for this alternative instead of the criterion of origin because 
shape was focussed upon in all instances of explanations by students in the 
verbal reports. 

(C) people see an are. with .any coloured lights in the sky. 

This answer is given a score of (1). It is based on textual information 
fro« sentence five. However, the relevant textual information is contained in 
another area of the text (sentence 3). Although the textual information 
selected deals with the appearance of UFOs, it is not the most relevant part 
of the text. 

(B) people knon they are unidentified flying objects in the sky. 
This answer is scored as (0). It is the le. c correct answer because it 
«.akes no sense either in relation to the text, or in relation to background 



46 

knowledge. People do not know for certain that what they see are 
unidentified flying objects, and this ,s not the reason given in the text 
that synonyms exist for UFOs, 
Answer Set Rgviginng 

The process of revising ansner sets based on students' verbal reports had 
tHo cofflpleaentary facets. One facet dealt with editing existing answer sets 
and the other with developing new answer sets which Hould reflect the range 
of answers students gave in their verbal reports. 

Answer sets were revised where students' explanations ni their choice of 
answer showed either that students made an inference but still selected a 
less than best answer, or that they used irzdvertently placed cues in the 
answer set to select the best answer, .he second facat is discussed in the 
next section with questions revisions. 
Vocabulary and Question Revisions 

A number of terras which students did not understand became apparent in the 
verbal report data. Samples of vocabulary revisions include the following 
substitutions, "scientific equipaient" for "technology" and "heavenly bodies" 
for "astronomical events". Care was taken to maintain the intent of the text 
and use of precise ter«s while substituting appropriate vocabulary for 
students at the grades six, seven, and eight levels. For exataple, in looking 
for a substitution for "astronooic.^' p^ents" chiMren's science texts, 
children's science encyclopedias, and .cience re..'ence books were consulted. 

Eleven questions were deleted from this test version. Five of the eleven 
questions were judged to be based too heavily on s'.udent's backgr 
knowledge. The five questions did not .eet the principle that a good 
inference question is one that requires a reader to integrate relevant text 




l-for.alion background lo construct co.plete interpretations that are 

consistent with both the text Infor.ation and background kno.led,e. Four o. 
the <i,e questions required students lo ,ive answers based on »ord k„o.led,e. 
For instance, one o, the ,te.s stated -The .ord Independence in this story 
" ""^ -'">-t readln, the text. In another 

ite., students' iack o, background knowledge ha.pered students in .akin, a 
co.plete inference, so the question »as deleted. The deleted ,te. read 
■■«=ney .i,ht be .ore risky to use than credit cards because-, but accordin, 

to students' verbal reports, they did notkno. that credit cards could be 

cancelled, and therefore less risky to lose than .oney. 

There.alnin, six questions .ere deleted ,or a variety of reasons. 

OiHiculty level indices ,ro. previous pilots indicated question 13 as one o. 

the .ost di.Hcult questions ,see Table 5-1,. student verbal reports 

indicated diHerences in .0. d interpretations ,ro. those intended by the 

authors. For instance, itea < "? cauc: ■ 

ce, itea .3 says Money is reused by", it seems students 

interpreted "reused" to tp^pt 

to refer to the same money being used over and over or 

saved by a single individual and not the circulation of «,oney. 

Oaestions requiring student, to make ti^e fra«e shifts .ere identified to 
be problematic as evidenced in their verbal reports and ite« analysis 
results. For example, students' verbal reports showed that they responded to 
ite. 20 Which read "Years ago. cows, coffee, and shells did not keep the.r 
value as well as aoney today because" from a current events perspective. A 
typical response was "they are not wante. by everyone, wher.as money might be 
because if you traded with people f.om the city they might not need cows." 
Item 20 had gone through three revisions and yet students seeaed to focus on 
tbe current rather than the past, so the item was deleted. The remaining two 



48 

questions did not function well as inference questions because they were 
judged to be too text-dependent, so thsy were deleted. 

Students' verbal reports also pointed to ite« ambiguities (items 9, 18, 
and 35). For example on item 9, students interpreted "find out" to mean 
discover neK facts, when the intended meaning Kas "learn". So the item stem 
"Using the reported information we would find out the most about UFOs by" was 
changed to "Using available information people learn the most about UFOs by". 
This modification required students to make the inference that the "available 
information" was the reports described in the story. 
Story Passage Revisions 

The final section of test revisions in this pilot deals with the story 
passages. The aajor change was with the "Money" story. Due to the fact that 
the first five inference questions in the "Money" story were deleted, the 
first two story paragraphs were also deleted. Two new paragraphs and five 
new inference questio- on the functions and characteristics of .oney were 
written for the "Monsy'= story. 

Minor changes were niade to other paragraphs through deletion and adoition 
of sentences. Sentences were added to story paragraphs in instances where 
-nore textual information wcs required for a specific inference question or 
where a new test question had been added. For example, the sentence "Weather 
conditions are checked when scientists study available information about 
UFOs" was added to the second UFOs paragraph to co«.ple«ent the question 
"Weather conditions affect UFOs sightings in the sky because" (UFOs Q5). The 
sentence in the third paragraph of the "Money" story "Large animals made 
trade difficult because there was too much price difference in items" was 
deleted. There was insufficient story information about trade items for 



O 5 b 



ERIC 



49 



students to answer the corresponding inference questions. Changes of 
specific vocabulary in story paragraphs were discussed under editing of test 
vocabulary. Remaining changes were ccsnietic in nature. 

Pilot Study Fnn r (Expert Ra^j ,) a\ 
The revised scaled-answer multiple-choice test was given to two fourth 
year college classes in the Faculty of Education at Me«,or^al University of 
Newfoundland. S.i.ty-one students participated in this pilot. The purpose was 
twofold: first, to have an expert adult sample confirm the researchers- 
rating decisions for young students" responses on the TIA test; and second, 
to have the experts take the test and to note on it any questions or answers 
which they found to be a«biguous, and to make co««ents on any aspect of the 
test where they felt revision might be necessary. 

It i. realized that so-called experts «ay be quite unreliable j-dges of 
items written for younger students because the adult conception of „hat is 
and is not familia. may be quite different fro« that of younger students. 
For example, the item "Money might be more risky to use than credit cards," 
required students to know that credit cards .ay be cancelled. A study of 
students' verbal reports revealed this to be a piece of information which 
they did not know. Consequently, while adults consistently made a complete 
inference on this ite«, the middle grade students never did. The item was 
dropped hecause it did not measure students' inference ability. 

For 85 percent of the itecs the experts rated the young readers' responses 
consistent with the ratings assigned by the researchers. The remaining ,5 
percent were taken to need further revisions. In addition, comments an. 
queries made by the experts were studied and appropriate changes made. 

The test mean for the two college classes was 23.95, out of 36 items, wzth 



ERIC 



50 



a standard deviation of 3.68 and a KR-20 reliability oi 0.58. There see.ed to 
be distinct divisions a.ong the expert sample about so«,e of the items. For 
instance, there were adults who wrote "there is no best answer here," on an 
item that required then, to synthesize two or more pieces of information. It 
seemed some of the experts would indicate the right information needed for an 
answer, but would not pull the information together to make a complete and 
consistent inference. The remaining -experts seemed to have little difficulty 
«.aking complete inferences consistent with those of the researchers. Thus, 
the majority of the experts were taken to be reliable judges of the best 
answers. 

Pilot Study Fiv p 

Test Validation 

The fifth pilot study was designed to study th« relationship between 
students' answer selections and their thinking processes in taking those 
selections. One purpose was ^o find out the quality of students' thinking 
when they selected their answer for each test itea,. Understanding students- 
thinking processes is of fundamental importance because students often arrive 
at good answers without thinking well and at less than good answers even 
though they may ha^-e thought well. A second purpose was to find out whether 
the verbal report process either improved or worsened students' performance. 

Specifically, four issues motivated the validation procedure: (i) to find 
out whether students understood the task, that is, that they were to use 
information fro- the text and from their background knowledge to answar the 
inference questions; Hi) to find out whether studentr understood each test 
item and reasoned well when they picked the best answers; (i^i) to find out 
whether students who chose an incorrect answer to an item did so because they 




51 



did not reason well,- and (iv) to ^ind out whether there is a difference in 
perfornance between the verbal report aid written cohorts. 

The challenge was to design and develop a test so that students n,alce 
inferences and that they do so for the "right" reasons. For a test of 
inference ability to be v.lid, the test should require that students .ake a 
complete inference when they select the best answer for an ite. (Phillips. 
1986). One assumption in multiple-choice test construction is that when 
students select the best answer for a test item, they do so for the right 
reason. However, it is possible that students «ight select the best answer 
for a test ite« without fully understanding it. For example, there may be 
some inadvertent cue prompting students to choose the right answer. A second 
assumption is that students who choose the incorrect answer do so because 
they are not reasoning well, yet students might select an incorrect answer 
for a good reason. For example, there may be an alternate interpretation from 
that intended by the authors, leading students to choose a less than complete 
answer even though they reasoned well. Thus, it is impo-tant to hav. students 
explain their reasoning when they select their answer for each question. 

Students' thinking ability was examined by having them verbally report why 
they had selected their answers. These verbal report protocols were used in 
conjunction with the students' answer .elections to provide information for 
t.st validation. The general principle followed was that tests would be valid 
to the extent that good inference-making led to good performance and poor 
inference-maHng led to poor perf ormanc.i. 
Sample and Procedure 

One hundred and eighty-three students in grades 6, 7, and 8 at three 
schools participated in this pilot. The students were selected at random 



ERIC 



52 



fro. intact classes and assigned randomly to either of two test conditions. 
There were 95 students tested in the verbal report condition and 88 students 
in the written test condition. Table 5-3 is a su..ary of the nu.ber of 
Student^- by school and grade. 

students in the written test condition wrote the multiple-choice test in 
their classrooms. The same administration procedures described in Pilot Study 
Two were followed. Two interviewers conducted the verbal report interviews 
"Sing the sa«e procedure described in Pilot Study Three. Students in the 
verbal report cohort were assigned to one of the three stories on a rotating 
basis. That is, the first student was assigned story 1, the second assigned 
story 2, the third did story 3, and the fourth student did story 1 thus 
starting a repeat of the cycle. The total administrations per story were as 
Table 5-3 

Pilot 5, Summary the Nu.hPr of Studpnf. bv Schnnl ..d Gr.de Ipvpi.n 
Written and Verbal Report Cohorts 



School 



Grade 6 
Verbal Written 



Grade 7 
Verbal Written 



Grade 8 
Verbal Written 



1(41) 


5 


6 


8 


8 


7 


7 


2(90) 


12 


16 


is 


IS 


17 


IS 


3(52) 


18 


8 


7 


7 


b 


A 



N= 183 35 30 



30 30 



30 28 



..lIo«s= 34 students co.pUted the 'UFOs' story, n students co.pleted the 
■"»ney story, and 29 students co.pleted The «ro„, Ne.spapers' story. 
Coding 

Three sets of data „ere collected: reading scores fro. the written cohort; 
and reading and thinking scores fro. the verbal report cohort. Written cohort 
responses were scored according to criteria developed in Pilot Study Three 
(See Criteria for Grading TIA in Appendix B). The reading score for an 
answer ranged ^ro^ 0 (implausible) to 3 (complete,. Students' total reading 
scores were the su. of the values assigned to all answers selected by 
students. The total possible score is 108. 

Verbal report explanations of students in the verbal report cohort were 
assigned thinking scores. The quality of students' explanations for each 
answer was rated according to specific criteria (See Appendix C for a copy of 
the Thinking Rating Scale). Thus, ^or each ite. there was a reading score 
for the answer selected and a corresponding thinking score for a student's 
explanation of why that answer was chosen. 

A trial sample of thinking protocols was selected at randoa fro« the three 
stories and grades. Two raters independently assigned a thinking score to 
each answer justification. Any inconsistencies between raters' scoring of 
thinking protocols were studied. The initial rating of this saall sample of 
the verbal report protocols allowed changes in the category descriptions of 
the thinking rating scale before all the protocols were scored. For 
instance, it was observed that sometimes students siaply repeated the answer 
they had selected as their explanation. In the initial thinking rating scale 
there was no provision for such a response. Consequently, a change was 
necessary and a thinking score of (0) was assigned for such responses. 



54 

Doth total and individual item thinking scores assigned by the two raters 
were compared. Inter-rater reliabilities on both comparisons resulted in 
correlation coefficients greater than .90. Any explanations assigned 
different thinking scores by the raters were discussed and re-rated. With a 
high level of reliability on the rating of students' explanations 
established, it was concluded that the remaining protocols could be 
consistently scored. About ZSZ of the remaining protocols were checked at 
random, and found to have a similarly high level of inter-rater reliability 
0.91). 

Data Analysis 

The data analysis examined six r -estions: (1) To what extent were 
students' reading scores and thinking scores on each test item in the verbal 
report cohort correlated? That is, did students who reasoned well select the 
best answer and did students who reasoned poorly select an incorrect ancwer? 
(2) To what extent were students' total reading and thinking scores for each 
story in the verbal report cohort correlated? (3) How did students' reading 
scores in the verbal report cohort compare with reading scores in the written 
cohort? (4) How is performance on each item related to overall test 
performance? (5) Did students' reading scores vary by grade level? and (6) 
Were there interviewer effects on test performance? 
Results and Discussion 

Reading and Thinking RpI at i onshi jpr Item. . Rafale 5-4 presents 
Pearson's correlaMon coefficients between studcr.^.s ' reading and thinking 



ERIC 



ERIC 



55 

lable 5-4 

P ilot 5, Pe.rsnn rnrrelation Cnenir^Pn^. Rnh. eeQeMn^and.!^^ 
by Iteffl. 



Ite.« Pearson's r iten.- Pearson's r 



12 .50** 

14 .62** 

15 .30* 



8 .77** 
*p < .05 ♦♦P < .001 



1 .82»# J9 

2 .62** 20 

3 .69** 21 

4 .77»# 

5 .62** 23 

6 .80»» 

7 .72** 

8 .54»* 

9 .94»» 

10 .09 

11 .68** 29 ,72** 



.54»» 
.45# 
.51** 
22 .39« 
. 45* 

24 .48* 

25 .66«« 

26 .45# 

27 .42* 

28 .53* 



30 .37* 

31 .61»* 
3? .42* 
33 .38* 



■52»» 34 .83,» 

--12 35 .54** 



36 .61* 



56 

scores by test iten,. A positive correlation significant at less than the .05 
level between reading and thinking scores was found for 34 of the Zo items 
with an average correlation of .55. Reading and thinking scores for iten, 10 
«ere not significantly correlated and for iten, 17 they were negatively 
correlated. These two items were examined, but no problems were apparent. The 
results of the previous pilot studies were examined and no indications cf 
problems with items 10 and 17 were found. The final decision was to leave the 
items without changes and to examine them in the next trial. 

For 94 percent of the items good thinking was significantly correlated 
^with good reading and poor thinking to poor reading performance. This result 
provides strong evidence that generally when students thought well they 
selected the best answer and when students reasoned poorly they selected an 
alternate answer. The significant correlations between reading and thinking 
scores for items is one piece of evidence that TIA is a valid test of 
iiif erence abi lity. 

Reading and Thinking Relationships for StorjPs . The reading and thinking 
relationship for each iter, is by necessity related to this relationship for 
each story. Ti.eive items accompany each story, therefore items 1-12 accompany 
story 1 'UFOs', ite«s 13-24 accompany story 2 'Money', and items 25-3. 
accompany story 3 "The Wrong Newspapers'. Table 5-5 presents Pearson ' s 
correlation coefficients between total reading and thinking score for the 
three stories. The correlation .-oefficients were similar, high, and 
significant at the .001 level for the three stories. 

It is reasonable to conclude that students understood the items and that 
students who selected the best answers thought weM. Thus, the significant 
reading and thinking relationships for stories is taken to be another piece 



C' 



ERIC 



Table 5-5 



Pilot 5. Pearson Correlation ^^Pffici.^^. h.^...„ 
Thinking Scores by Story 



Story 

Pearson ' s r 



■UFOs' Story 1 

.77* 

■ Money ' Story 2 

' .75» 



"The Wrong Newspapers' Story 3 



.77* 



* p<.001 

of evidence that TIA is a valid test of inference ability. 

Reading Perfnr.anrP RpI ati onshins P.^...n ., . ,bal Rpnn.^ 

Table 5-6 presents story reading score «eans by cohort. The .a.i... 
reading score for a story would be 36, as each story has 12 test ite.s, „,th 
a total possible score of (3) per ite.. Means for story 1 and story 2 were 
very similar for the verbal report and written cohorts. Means for story 3 
differed 2.8 in favour of the verbal report cohort. It is not clear why a 
difference occurred. This difference translated into test performance would 
amount to the verbal report cohort doing better on 1 ite.. Across the entire 
test, the overall «ean for the verbal report cohort is 24.7 and 23.3 for the 
written cohort. A difference between .eans of 1.4 which translates into less 
than half an ite« correct in favour of the verbal report cohort. Thus, the 
difference in .eans on story 3 was not taken to be large enough to invalidate 
the verbal report .e^hodology. Asking students to think aloud does not 
significantly alter their performance. 



Table 5-6 

Pilot 5. Story Readino Scors Means by Pnhnrf 



53 







Story 




Cohort 


1 


2 


3 


Verbal Report 


22.3 


24.8 


27.0 


Written 


22.2 


23.6 


24.2 



Mean reading scores per story for both the verbal report and written 
cohorts were compared using ANOVA to determine «ore specifically whether the 
verbal report process altered performance. Tables 5-7, 5-8, and 5-9 present 
the ANOVA results for UFOs, Money, and The Wrcng Newspapers, respectively. 
There were no significant effects for cohort for either the UFOs or Money 
stories. However, cohort showed a significant effect (p <.05) for The Wrong 
Newspapers story. It is not easy to explain why a difference in performance 
by cohort was found on only The Wrong Newspapers sto.-y. 

Grade had a significant effect nn students' reading scores for story 1 
(UFOs) but was not significant for stories 2 0. 3. The discourse type .ay 
account for the grade effect found for .tory 1. Students in grades si.," 
seven, and eight «ay all be familiar with the descriptive and narrative 
discourse for«s of stories 2 and 3. But, students' reading scores on 
expository material (story I, ,ight show an improvement for students in 
grade; seven and eight when co<npared to grade aix students. 

There was no significant interaction effect between grade and cohort. 

In sur,, reading performance between the verbal report and written cohorts 



ERIC 



59 



Table 5-7 



Pilot 5. .mu .»A,n, sere Bes„U, ,,.r. . ,„rn„ 



Source of 
Variation 



Cohort 
Grade 

Cohort X Grade 
Within 



SS 



DF 



MS 



P Si gni f i cance 
of F 



.72 


1 


.72 


.03 


.869 


402.97 


2 


201.48 


7.58 


.001 


.91 


2 


.46 


.02 


.983 


3082. 17 


116 


26.57 








Cohort 


35.49 


1 


35.49 


1.78 


. 185 


Grade 


21.46 


2 


10.73 


.54 


.586 


Cohort X Gr.-<de 


14.47 


2 


7.24 


.36 


.697 


Within 


2277.44 


114 


19.98 







o 

ERIC 



60 



Table 5-9 

PjUU 5, ANOVA Read ing Score Results forstorv ZlTh. Hrono .M.w.p .po.c k.. 
Cohort and Grade 



Source of 


SS 


DF 


NS 


F 


Si gni f i cance 


Van ati on 










of F 


Cohort 


168.94 


1 


168.94 


6.06 


.015 


Grade 


8.29 


2 


4.14 


.15 


.862 


Cohort X Grade 


76.93 


2 


38.47 


1.38 


.256 


Within 


3093.75 


■11 


27.87 







Has taken to be highly siailar. Assuming that verbal reports are an acr -rate 
representation of the thinking that went on during the test-taking and the 
reports are an accurate representation of the thinking of those in the 
written cohort, then it can be concluded from the evidence presented that 
students understood the task and reasoned well when they picked the best 
answer. In addition, the usefulness of -aal reports to understanding 
students' reasoning and to validating tests is strongly supported. 

The Relationship of Itea Performance to Storv Performance . Stude.its in the 
verbal report cohort coapleted only one story, so iten analysis results are 
presented by story for both the verbal report and written cohorts. Tables 5- 
10, 5-11, and 5-12 show the i tem/bi ser i al correlatio.ns between reading scores 
on test items and total reading score for the given story. 

The Item/test biserial correlation coefficients were positive for all test 
items and ranged from a low of .163 (item 17) to a high of .693 (item 3). 



The correlation coeHicient. sho. that generally students' performance on 
individual test ite.s was positively related to overall test performance. 

The difficulty indices were computed as the proportion of students picking 
the best answer. This is n.t the best indicator of difficulty for a scaled- 
answer items because it does not take account of students scoring is and 2s. 
In Chapter Si. an inde. computed as average score on an item is used, but the 
rough inde. based on rights and wrongs will suffice here. A low difficulty 
index MOO, would indicate a more difficult test ite. than a test item with 
a high difficulty index (.600). Tables 5-10, 5-11, and 5-12 show the 
difficulty indexes of the test items which range fro« a low of .197 (item 1) 
to a high of .746 (ite« 12). The range in the difficulty level of ite^s was 
expected and is within normally r ecoflmended bounris. 

The KR-20 reliabilities calculated separately for each of the three test 
stories (12 ite.s each, for the combined verSal report and written test 
cohorts were 0.57, 0.23, and 0.50 for stories 1, 2, and 3. The written test 
cohort completed all 36 test items with . KR-20 reliability of 0.69. 

In sum, the results of the relationship of itea performance to overall 
story performance was taken as evidence that students understood the task and 
that students reasoned well „hen they chose the best answer. Thus, it is 
concluded that TIA requires students to make a complete inference when they 
select the best answer. 

Reading ScorP. hy Rr.d. . Students' reading scores by ,rade level were .Iso 
examined. Since only the written test cohort in the fifth pUot study 
completed ail 36 test ite.s, then only their scores were used in this part of 
the analysis. As the highest reading score for ,ach test i te« was (3, , the 
-aximu. reading score for 36 items was 108. Reading score means and 



62 



Table 5-IO 

Pilot 5. Item Analysis, Sforv 1 mo^) Verbal Rpnnrf and Mrittpo 



Test Cohorts 



Item Biserial Difficulty 

Correlation Index 



1 

X 


COO 
. 577 


. 197 


9 


• 628 


.648 


3 


.693 


.385 


4 


.454 


.254 


5 


.492 


.385 


6 


.659 


.369 


7 


.601 


.418 


8 


.653 


.413 


9 


.608 


.336 


10 


.441 


.246 


U 


.541 


.615 


12 


.454 


.746 



KR-20 Reliability = 0.57 



. 0 



ERIC 



Table 5-11 

Pi lot 5, Ite« Analysis, Story ? (Hpo Py) ve rbal Rpnnr. ... ..... 

Cohorts 



en 





Biserial 


Difficulty 




Correlation 


Index 


1 J 


.359 


.542 




.246 


.250 


1 3 


.420 


.600 


i 0 


.355 


.533 


1 7 
1 / 


. 163 


.367 


1 O 
1 O 


. 548 


.617 


19 


.602 


.392 


20 


.480 


.408 


21 


.610 


.442 


22 


.353 


.567 


23 


.464 


.533 


24 


.374 


.383 



KR-20 Reliaoility - 0.23 



Table 5-12 



Pilot 5, Itgg) Anilysis, Story 3 (The l^rnn^ Ki o .spao.r.) ^o.k.i r,,,,, 
Written Test Cohorts 



Item Biserial Difficulty 

Correlation Index 



25 




. 658 


26 


.522 


.564 


27 


.460 


.504 


28 


.455 


.410 


29 


.378 


.350 


30 


.590 


.453 


31 


.436 


.410 


32 


.617 


.521 


33 


.426 


.632 


34 


.650 


.40/? 


35 


.494 


.325 


36 


.631 


.598 



KR-20 Reliability = 0.50 




63 



standard deviations are presented in Table 5-13 ^or grades si., seven, and 
eight students. 

« one-»,y ANCVfl .„ perfor.ed ,ith readin, .core „ the dependent variable 
and grade as independent variable. Grade .ae ,.und to have a significant 
e«ect at the .0, leve, ,See Table =-l„. ,he overall trend in .ean 
per,or.ance ,r.. grades 6 to 8 »as a desirable result. ,t is assu.ed that 
students would perfor. better in .aHn, inferences .ith each passin, grade. 
Si»ce no is intended as a .easure o, inference ability in grades 6, 7, and 
8, then a significant difference by grade suggests that TIfl is sufficient., 
discri.inating to detect differences in perfor.ance, should they exist, at 
each grade level. 
Table 5-13 

P ilot 5, Reading Score Mpans and St.nri.rH Deviation, hy r,..M. 



Grade m. S.D. 

^ 65.1 10.8 

7 70.3 12.5 

8 74.6 11.5 
All grades 69.9 



12.1 



.ntervie.er Fffert on Test Perfnr^.nr. Verbal report students- reading 
and thinking scores .ere analyzed by story .Uh interviewer as the 
independent variable. T.o separate one-way ANOVAS .ere performed for each of 
the three stories. Therefore, for each story there .as one analysis with 
reading score as the dependent variable and a second analysis with thinking 



yjr 



Table 5-14 

Pilot 5. ANOVA Results of Reading Scores by RrariP 



Source of 
Variation 



Grade 
Resi dual 
Total 



SS 



DF 



1338.44 
11424.55 
12762.99 



2 

85 
87 



HS 



669.22 
134.41 



Signi f icance 
of F 



4.98 



0.009* 



score as the dependent variable. No significant effect for interviewer was 
found a* the .05 level for any of the six ANOVAS calculated. IntervieHer, 
therefore, did not see« to affect students' reading or thinking performance 
in the verbal report cohort. This was an encouragi ng resul t Hhi c.i . -ovi ded 
support for the usefulness of trial interviews to eliminate potential 
interview problems which may affect the primary data collection. 

Sunsiary 

The data analysis and test results discussed in the preceding sections 
show the development and statistical support for TIA as a test with both 
construct validity and rel i abi 1 i ty . Each subsequent ersion of the TIA test 
was an improvement over each previous version and it was not cle.r what would 
be gained fro« further data collection, so the pilot studies were considered 
to be complete and the TIA test ready for final data collection. 



CHAPTER SIX 

FINAL DATA COLLECTION: ANALYSIS AND RESULTS 
This Chapter reports the demographics o. the sample., the Hnal data 
collection procedures, and the basic statistics. It also discusses potential 
extraneous influences to te=t performance and presents the reliability 
estimates of T.e P^ilUps-P.tterso. Test of I.fere.ce A^Uity ,ea.i„, 
Comprehension . 

Samples and Data CoUecticn 
»in= hundred .„d „i„ety-„i„= students in grades 6, 7, end 8 ,rc. schools 
in «Iberte, Ne.foundUnd end Lebredor, Neve Soot.e, end Onterio c.prised the 
".pl=s ,.r the Hnel dete collection ,Dele .ro« Ne« Bruns.ick errived to. 
Ute to be included in this report,. C.ntect .es ,ede with educet.rs et 

schools and school board n^^iroc 4.u«- 

offices and their cooperation was sought for the 

administration of the TIA test. 

"hen epprovel ,es grented to proceed with the project, the contect persons 
-ere fornerded the necessery .eteriels. They eiUer errenged to gi.e the 
tests the.selves or ,or cUssroo. teachers to ed.infster the. dur.ng 
scheduled lenguege erts Cesses. Eech per ti ci peting teecher ues given e copy 
the directions s.eet in Appendix 0. The original contect person »es the 
facilitet. (or eech province. Thet person took responsibility for 
-istrihuting th, .atrriels, ensuring their proper ed.i ni str eti on , collecting 
the »eteriels, end returning the. to The Institute ,cr Educetionel Resr„-ch 
and De.elop.ent et «eooriel University o< Ne.foun.i end. The ,in,l dete 
-ollection took piece during the .inter ar:d spring of 1987. 

SaflDle Demoqra phirc 
students in the AUerta se.ple „ere fro. en urben centre »ith a population 
=< appro,i.ately SO.OOO. It is a trading centre for an ag, i cul tural -based 



economy. The students were described by their teachers ..... 

tneir ceacners as mostly middle 

class. The school, range in population fro. 150-650. L„s than 4 percent of 
tl-a school population is English as a second language students lESL) or 
native students. Classes »ere descrihed as having a ,e. bright students, a 
•ajorit, o, average students, and so.e students requiring additicll 
assistance with instruction through remediation classes or learning 
assistance programs. 

The Newfoundland and Labrador sa.ple 0, students .ere ,ro. two large rural 
centres. In one centre the population, including surrounding villages, ,s 
appro,i.ate:, 10,000. The area .ay be described a. econo.icallv dep.-essed 
»ith the .ajority o, families described as lo. to .iddle class. The students 
"ore fro. a school with a total population of 430 students. None of the 
student: are ESL students, however, so.e have been and are involved in French 
i..arsion progra.s. Classes were described as heterogeneous. Th , other rural 
centre has a co.bined population of approxi .atel y 14,000 people .n two 
adjacent towns. ,t is a .inin, centre with a high e.ploy.ent rate and, for 
t^a .ost part, .iddle class fa.ilies. The students are fro. schools ranging 
in Size fro. 350-600 students. There are no ESL students, but French 
i..=rsio, programs are co..on. The classes were descrihed as heterogeneous. 
Teachers reported an extensive availability of resources and e,tracurr .cul ar ' 
opportunities in these schools. 

students in the «ova Scotia sample were fro. two rural areas ranging in 
population fro. 2000-5000 people. The econo.y is far.-hased, with the 
ccunities comprising a ..xture of lower and middle class fa.il,.s. The 
populations of the two schools were 200 and 275 with no ESL students. Classes 
-are descrihed as heterogeneous. ,bout half the students participated in 



68 



ERIC 



sports activities and school cluhc j l 

scnool clubs. Students were genarally described as 
enthusiastic and willing to learn. 

Stadencs in the Onlari. sa.ple were .r.„ a„ urban centre .ith a population 
ran,i„, ,r.« 40,000 to iiO.OOO inCudin, the surrounding areas ,ro» .n.ch 
Children are bussed to the city schools. The students ...re ,ro» a .,de ran,= 
=. econo.ic levels, an„ .any ,ro. single parent ho.es. Classes .ere described 
as heterogeneous .ith about 20 percent o, the students re,uiring re.odlal 
instruction. Less than 3 percent o, the school population includes ESL 

Students, and brighter students often oo into Frpnrh •» 

□Tten go into French immersion programs after 

grade 4. The economy in the area ic hunt- 
area IS built on service and government 

insti tutions. 

Analysis and Results 
students recorded their ans.ers to the nfl,.estions on a standardized 
ans.er sheet. Each puestio:, has ,our possible ans.ers («, B, C, or 01. Each 
ans.er is .orth a value o, either a 0, 1, 2, or 3 dependent upon the puality 
the selected response. The appropriate value .as assigned to each selected 
response and the assigned values totalled to constitute the test score for 
each student (See Appendix E *or Key to TIA flns.er Scales). 

rest Score Be,„Us (or Prnvinr, , c „ ,. er,H.. „. 
The .ean scores ,or each province (total possible score is 1091 are 
presented in Table No clai.. about the representativeness o, these .eans 

*or each province are ,ade. They are probably .ore a reflection o, local 
rather than provincial factors. 

Table 6-1 also presents .ean per»or.ance scores ,or the entire sa.ple by 
"X, grade, and age. The .ean ,or the entire sa.ple o, students ,or .ho. data 
-as co.pUt, ,N - ,74, is 73.57 .ith a standard deviation o* 13.63. Table W 



70 



Table 6-1 

Mean Scores for Prov ince, Sex. Grade. Anii 



Variable 



Mean 



Province 

Alberta 77.74 

Newfoundland & Labrador 70.51 

Nova Scotia 72.92 

Ontario ^9^94 

Sex 



Hale 



U Years 



72.57 



Female 74. 4^ 
Grade 

Grade 6 70^77 

Grade 7 72. 99 

Grade 8 7^.47 

11 Years 71,99 



73,29 



13 Years 75,14 

14 Years 72.95 



385 
458 
90 
66 

518 
481 

324 
330 
344 

218 
299 
33: 
126 



shoHS the ANOVA .ain effects on these sa.e variables. Of particular 
relevance in this report are the ^ain effects of sex, gradn, and age on test 



score. 



ERIC 



In the case of significant sex differences, the fe.ales' .ean performance 
is higher than the .ales'. A comparison of the .eans presented in Table 6-1 
indicates a difference of about two points. Based on the perplexing welter of 
research findings on differences between .ales and fe.ales, it seo.s .any 
questions remain unanswered (Downing, May, Ollila, 1982). Questions about 
such matters as the effects of different cultural expectations, genetic 
factors, and teacher-.odel differences all seea ..o point to the necessity cf 
further research prior to the drawing of any conclusions. Differences in this 
Table 6-2 

ANOVQesults on Final n.t.r Tps^ qrn re_by Province, .p., ..... . ... 



Source of 


Sua of 




Mean 




Si gni f i ca 


Variation 


Squares 


DF 


Square 


F 


of F 


Province 


7940. 10 


3 


2646.70 


17. 12 


.000 


Sex 


576.38 


1 


576.38 


3.73 


.054 


Grade 


2498.39 


2 


1249.20 


8.08 


.000 


Age 


2425.09 


3 


808.36 


5.23 


.001 


Prov X S 


812.07 


3 


270.69 


1.75 


. 155 


Prov X G 


2223.86 


6 


370.64 


2.40 


.026 


Prov X A 


2389. 19 


9 


265.47 


1.72 


.081 


Sex X Grade 


1122.91 


2 


561.45 


3.63 


.027 


Sex X Age 


827.75 


3 


275.92 


1.78 


.148 


Grade X Age 


2595.26 


4 


648.81 


4.20 


.002 


Within 


141319. 11 


914 


154.62 







- 

f J 



data on the basis of sex ar. .ini.al. Furthemore, differences in performance 
between the sexes are small u. comparison to the range of differences .n 
performance within a sex. for these reasons it is not believed that the TIA 
test IS biased in favour of any sex, nor that the daca is untrustworthy for 
generalization purposes regardless of sex. 

Table 6-2 shows grade and age significant at p <.05 level. In the case of 
significant differences by grade it is important to know whether th. 
differences are between grades 6 and 7, grades 7 and 8, and gradss 6 and 8. 
Sheff^s a posteriori comparison of means test was done (Kirk, 1968). While 
Sheff^s S method allows for the calculation of significant differences in 
means when there are unequal n's, it does set the highest critical statistic 
of ail the multiple coaparison tests. The only critical difference in means 
on the TIA test was between grades 6 and 8 where the di f ference i n means 
(5.70) exceeds the critical value of 3.87. It is likely that had it been 
app-opriate to use a less rigorous test that differences between grades 6 and 
7 and between 7 and 8 would have been found. Nevertheless, the significant 
differences between grade 6 and grade 8 indicate a general tendency 'or 
performance to improve with grade levil. 

Age differences are not easy to separate fro« grade differences. The mean 
of grade eight students is higher than the mean of thirteen and fourteen year 
olds. In addition, the number of thirteen and fourteen year olds is 114 more 
than the number of students in grade eight. It seems that soae ,;f the 
thirteen and fourteen year olds are outliers and repeaters. Those students 
who are age and grade appropriate generally perform bettor. Therefore, the 
n.ean of the grade eight students is higher than the means of the thirteen and 
fourteen year olds. So, age and grade may not represent the same differences 



ERIC 



in performance. 

Itefa Statistics 

Item Dlfficnlfy 

Typically, difficulty level indices reported on a nultipl e-choice test are 
given as the proportion of students getting an ite. right, but on a standard 
multiple-Choice test the only scores are 1 and 0, where 1 is for the correct 
Choice and 0 is for any other choice. Thus, the proportion of people getting 
the ite« right, that is the difficulty level, is Just the average performance 
on the ite«. So by extension, the difficulty level index for the TIA test in 
Khich possible scores are 0, I, 2, or 3 is again the averag. performance on 
the ite«. See Table 6-3 for the percentage breakdown of students who chose an 
answer worth 0, I, 2, or 3 for each cest itea (total of 36 ite.sJ.The 
percentage of students who chose answers other than the best (the «ost 
consistent and complete) for each ite« reflects the variability in 
performance. Ite« difficulties are given in Table 6-4. When reading this 
table, note that the higher the difficulty level index the easier the ite«. 
As can be seen fro. Table 6-4, there is a range of difficulty levels across 
the test. For instance, aany students found itea 12 fairly easy whereas it.. 
4 appears to have been «ore difficult for then. These i ndi ces . epresent a 
range or challenge for students. 

The Relationship nf Itea Perfnr..nre to OvPr.n Po.x 

Typically, the ite./test correlation is computed using a biserial 
correlation coefficient. This is a correlation between dichotoaous (0,1) and 
continuous -..riables (0-36 range ofite.s). However, on the TIA testite« 
scores are not merely dichotoaous variables, but rather interval variables 
«>, 1, 2, and 3), so the appropriate statistic is the Pearson's r. Tatie 6-4 



Table 6-3 

Fercentaoe S^l.Hon^ •.s QbtaininQ par h_ Po g.sible Score 



74 



Item Grade Scores Item Grade Scores Iten Grade Scores 
0 1 2 3 0 1 2 3 0 1 2 3 



1 


6* 


37 17 27 19 




7** 


40 11 24 25 




B**t 38 10 27 25 


2 


6 


6 10 17 67 




7 


5 11 12 72 




8 


5 7 9 79 


3 


6 


12 28 )0 50 




7 


13 28 7 52 




8 


17 23 7 53 


4 


6 


29 34 16 21 




7 


26 29 17 28 




8 


25 31 22 22 


5 


6 


4 18 43 35 




7 


3 17 33 47 




8 


2 20 30 48 


6 


6 


19 19 11 51 




7 


13 18 7 62 




8 


8 20 5 67 


7 


6 


23 19 9 49 




7 


15 17 8 60 




8 


15 17 9 58 


8 


6 


15 16 16 53 




7 


18 14 16 52 




8 


17 11 16 56 


9 


6 


4 51 6 39 




7 


1 43 5 51 




8 


2 39 4 55 


10 


6 


6 29 34 31 




7 


6 28 33 31 




8 


4 22 35 39 


U 


6 


3 19 21 57 




7 


3 16 19 62 




8 


3 15 16 66 


12 


6 


5 8 8 79 




7 


3 9 6 82 




8 


6 6 7 81 



13 


6 


12 16 13 59 




7 


13 14 9 64 




8 


8 13 10 69 


14 


6 


5 59 11 25 




7 


5 69 8 18 




8 


4 68 5 23 


15 


6 


47 2 5 46 




7 


34 6 8 52 




8 


29 3 5 63 


16 


6 


10 14 13 63 




7 


6 10 17 67 




8 


7 5 18 70 


17 


6 


12 24 38 ''6 




7 


9 30 37 23 




8 


10 22 44 24 


18 


6 


3 12 11 54 




7 


22 10 9 59 




8 


18 10 9 63 


19 


6 


16 14 27 43 




7 


18 14 25 43 




8 


17 13 77 43 


20 


6 


15 16 23 46 




7 


11 20 20 49 




8 


8 19 17 56 


21 


6 


15 16 15 54 




7 


13 12 11 64 




8 


7 13 17 63 


22 


6 


4 17 26 53 




7 


3 17 27 53 




8 


3 18 22 57 


23 


6 


25 9 6 60 




7 


22 8 4 66 




8 


21 10 6 63 


24 


6 


7 7 48 37 




7 


5 11 47 37 




8 


5 11 43 41 



25 

^ vl 


0 


0 7 
7 / 


7 75 




7 


\ A Q 
l\) 0 


11 71 




□ 

0 


L L 
0 0 


19 ^ L 

1 2 lb 


^ o 


L 
0 


4 


14 59 




7 


1.2 3 


1 c en 

15 58 




□ 
0 


^0 5 


1 1 58 


77 


L 
0 


26 1 1 


20 43 




7 


£.1 11 


17 45 




□ 
0 


*5 ^ □ 
Z J 0 


21 46 


£.0 


L 

0 


ic 9 


30 45 




7 


1 7 O 

1 7 9 


28 46 




□ 
0 


1 0 9 


36 42 




L 
0 


11 11 


45 33 




7 




hQ 41 




q 

o 


II II 
11 11 


T T AC 




L 
0 


07 1 A 


10 49 




7 


9/ 1 R 
/ ^ 1 J 


1 U J 1 




a 
o 


9A 1 1 
1 1 


II CO 

1 1 5o 


31 


k 

u 


9 < ^ 


0 1 44 




7 


1 C J 


t 1 A L 

s> 1 46 




g 


1 0 4 


9S A 1 
<1 J o 1 


32 






1 T A7 
1 O 4 / 




7 


34 5 


13 48 




8 


28 2 


14 56 


33 


6 


15 8 


20 56 




7 


20 8 


21 51 




8 


12 12 


18 58 


34 


6 


34 8 


21 37 




7 


27 8 


22 43 




8 


19 10 


22 49 


35 


6 


21 11 


45 23 




7 


23 12 


40 25 




8 


8 13 


12 67 


36 


6 


14 16 


11 59 




7 


19 U 


11 59 




8 


8 13 


12 67 



* 324 students 
** 330 :*tudents 
344 students 



Table 6-4 

Iteffl Statistics 



75 



It=. Ite./Test ,te. DlHicuity He. He./Test ,te. Di„i„Uy 



Correlati ons 



Index 



Correlations 



Inde^ 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 

U 

12 

13 

14 

IS 

16 

17 

18 



.212 

.223 

.190 

. 151 

.313 

.359 

.326 

.224 

.376 

.229 

. 198 

.266 

. 185 
-.010 

.222 

.263 

.055 

.220 



1.338 

2.531 

1.961 

1.387 

2. 183 

2. 153 

2.021 

2.063 

1.998 

1.982 

2.396 

2.632 

2.277 

1.469 

1.766 

i.414 

1.781 

2.053 



19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 



. 179 
.281 
.388 

. 162 

.295 

.403 

.172 

. 195 

.325 

.427 

.388 

.478 

.303 

.451 

.334 

.357 

.485 



1.943 

2. 101 

2.241 

2.302 

2.078 

2. 169 

2.501 

1.825 

1.825 

2.501 

2.072 

1.906 

2. 145 

1.816 

2. 132 

1.806 

1.736 

2. 199 



ERIC 



8, 



presents the ite«/test correlations. Performance on individual ite.s .s 
iypically :ssu«ed to be positively related to performance on the test as a 

whole. Since TIA is a test of inference abilitv in ro.n,- 

aoiiicy in reading comprehension, 

then each Ite. should be an inference question, =. it „„„ld be expected that 
si.llar reeult. „„„,d be attained acres, ite.s. Hc»ever, inference-.aHng is 
c.plex. s..e inference questions .ay be logical, infcr.ational , .r 
evaluative in nature and yet be broadly related by general categories such as 
beyond text or across text (Wixson, Peters, «eber, S fioeber, ,,871. ,h„s, it 
can be imagined that inference guest.ons .ay be unrelated on other 
di.ensions. One such di.ension is context. Context a„ects reasoning o,...e 
* ereene, 1984, Spiro , „yers, 1,84, stanovich, 1,80,. Each question is 
presented .itbin the context o, an unfolding story, y.t each guestion is 
presented in a slightly different context, so si.ilar correlations „ould not 
be expected on all items. 

Guided by this presumption, any items with essentially zero or negative 
item biserial correlation3 were noted. Tabl« 6-4 shoHS that there „as one 
negative item/test biserial correlation Ute« 14, and one which .as 
essentially zero (ite. 17, . Ite. 14, as can be seen fro. Table 6-., was 
answered by the greater proportion of students as a non-inference question (a 
score of 1,. m other words, students chose a text-based response. Such a 
response by the majority of students points to a problem with either the 
wording of the question or a perceived high similarity a«ong answer choices 
on the p.irt of the students. A reexarai nation of students ' verbal reports 
^rom the last pilot study indicated a problem with word-choice with item 1-;. 
A revision has been .ade for future uses of TIA. Ite« 17, also on the Money 
story, demanded a high l^vel of understanding of the features of .oney which 



6u 

ERIC 



..y h,ve been „„,air .0 the students. » reexe.i„.ti„„ 0, st„.e„ts- verba, 
reports corroborated the suspicion that in order to choose a 2-point ans.er , 
the, „ade a sophisfca.ed, complete, and consistent inference. The scorlnj 

key was modified for itera 17. 

fhe Belatinn^ l up of Story P,r,„r., 

nnU,s-^.U,rs,. z„,ere»ce mu>y 1. 

is »ade up 0, three of the .ost co».on discourse for.s found in the .iddle 
grades. Research indicates that narrative, descriptive, and e.,p,sitor, texts 
.a,.e distinct de.ands upon readers, thus it see.s that differences in 
co.prehenslhUity between narrative and descriptive and expository texfs 
should be expected. Table 6-5 sho„s the percenta,e of all responses 
ohtainln,scoresof 0, ,,2, 3 b, ,rade leve, and story. Seventy-t»o 
percent of all resprnses on The „ron, «e»sp.-oers are ,uallty responses 
.scores .f 2 and 3, co.pared to sixty-six percent on the UFOs story and 
-ty-ei,ht percent on the Honey story. The Wron, .e.spapers stor, is a 
narrative, the discourse for. taKen to be the easier of the three, yet the 
differences in perfor.ance are not as dra,.acic as expected. This result 
raises an interesting question. 

A question uhich regains to be studied Is whether particular types of 
.nference ,uest ns, re,ard,ess of discourse for., present .ore of a 
Challenge to students than other:. There is ci -cu.stanti al c.idence fro. the 
Pilot studies and fro. ,uestions rated as difficult on the final study to 
support such a suspicion. Logical Inference ,uestio„s on Tl« that re,uired 
the synthesis of several pieces o, infor.atlon .ere .ore lUely to be 
answered In an Inco.pleto .anne. than Infor.atlona, Inference ,uestions such 
as elab.rauon or setting the context, regardless of the discourse for.. « 



plausible explanation for the minimal performance differences as displayed in 
Table 6-5 is that inference questions requiring the synthesir of several 
pieces of information were asked on all three discourse forms. If students 
experience difficulty with logical inference questions as Puggt-sted, then 
perhc.ps the type of question asked is an important variable in addition to 
the discourse forn> being studied. 
Table 6-5 

Percentage of All Responses Receiv ing Scores of 0. 1. 2. and 3 bv Grade Level 
a ^d Story 



Grade UFOs 



^oney Newspapers 





0 


1 


2 


3 


0 


1 


2 


^ 


0 


1 


2 


3 


6 


14 


22 


18 


46 


16 


17 


19 


48 


21 


9 


22 


48 


7 


12 


20 


16 


52 


13 


18 


19 


50 


2: 


8 


21 


50 


8 


13 


20 


16 


53 


12 


17 


19 


52 


16 


8 


23 


53 



Potential Extraneous Influences 
Potential extraneous influences on perfor..-nce on the TIA test include 
such factors as test-taking strategies, test wiseness, and guessing. «hils 
these are autually exclusive, I mxU deal with each separately. 

Test-taking Strategies 
Care was taken to provide clear, unambiguous directions to all TIA test- 
takers. Students were informed that they were to use information provided in 
the story and information they already knew in deciding upon the best answer. 
They were told to think about which answer out of four they thought was the 




b«l one. A sa.ple ite. „as don. „,th .h. students ,See Appendix D for a cop, 
01 the co.plete instructions). Special mention »as .ade o. the i.portance ,or 
students to consider all possible ans.ers before deciding on the best one. 
Students Here informed of the scoring syste«. 

TIA is a power test, so no strict ti.e li.its .ere set. Students »ere told 
that it ta.es about a class perio. or so to d, the test. Teacher reports of 
the final data collection indicated that .ost student. „er. finished the test 
in about 30-35 .inutes, excluding ti.e for directions (total ti.e 
approxi.ately ,0-45 .inutesl. The intent „as to alio,, students ti.e to think 
and to carefully consider their ans-er choices .ithout the pressure of a 
speed co.p,nent. Test users ,ay use the average co.pletion ti.e of 30 .inutes 
as an indication of ho. their classes co.pare .ith others in ti.e ta„e„ to do 
the test. 

During the develop.ent of r„, ,,t,„,,,„ ^^.^ ^ ^^^^ ^.^^^^ 
i..ortant rules for test construction .hich are in har.ony .ith sound 
established .easure.ent principles (Standards for Educational and 
Psychological Testing, 1,S5,. Rules such as avoiding ite.s .ith negative 
Questions, using cautiously gualifiers such as -al.ays- and "usuall,.-, and 
avoiding ite. ste.s si.ilar to text infor.alion. other factors ,hich .ay have 
contributed to test-taUn, strategies .er. considered in the te.,t refine.ent 
process and have been reported in Chapter Five. 

Test Misenpqc 

There is a sense in .hich test .iseness has to do .ith general .iseness or 
perceptiveness. Students .ay capitalize upon cues of various sorts „h.ch 
"'"U result in i.proved perfor.ance on a test for reason, other than use of 
toe ability being tested. Students' verbal report protocols .ere studied in 



ERIC 



87 



each 0^ the pilot studies ior evidence use such cues. Test rev'sior.s 
were made if cues were suspected. Test revisions Here discussed in Chapter 
Five. 

Guessi no 

In the case of a short-answer test students guess only if they construct a 
response, which reduces the risi: of attaining a higher score due to guessing. 
On ths other hand, in the case of a multiple-choice test the probability of 
attaining a higher score due to guessing is greater (Slakter, 1967). 

The scaled-answer scoring system (scores of 0, 1, 2, 3) used on TIA 
further complicates the issue of guessing, because the probability of a 
student getting some positive score for each itea on TIA is .75. Contrast 
this Kith the standard four-option multiple-choice test with one correct 
an-.<*er >;here the probability of getting a positive score is only .25 on each 
item. Given the unusual scoring systew used on TIA, it would be instructive 
to examine the distribution of scores which could be obtained through 
guessing (Johnson & Kotr, 1977; Larsen, 1969). 

Table 6-6 presents the distribution of total possible test scores (0-108) 
and thc-ir corresponding cumulative probability under the assumption of random 
guessing on each item. Note that since there is a considerable char.ce of 
scoring points through guessing it is virtually impossible to guess and 
receive less than about 40. Even {or a total score of 5s which would be 50 
percent, there is a 47 percent c'-ance of attaining at least this high by 
guessing. However, if you look at a total score of 58, only 4 points higher, 
the chances of attaining a score cf 58 or higher are dramatically reduced to 
25 percent. 

The ovtvall cean score on TIA is 73.48 and as can be seen fron Table 6-6 



So 



Table ^-6 

Distribution of Tn^al Test Scorp Mnd.r f^>-n..n M" n,,^Oan.doOesMnse 



erIc 



^"^ Total Cub mf-, 

Score ■/. Score y, 1°^"^ ^""^ 



Score V 



1 2.12E-20 37 ,,7 

2 1.49E-17 38 i 0 II 

3 1.94E-16 39 'sJ '2 

4 1.93E-15 40 i"f8 7. 

5 1.58E-14 41 3 10 

^ 42 4 31 7« 

7 6.75E-13 43 588 70 

^ ^'■68E-12 44 7 65 II 

10 8.23E-11 46 32 11 

11 3.44E-10 47 A 7 II 

13 4.90E-9 49 25 2 li ''^ 

50 30 2 8. 

15 5.47E-8 51 35 5 It 

16 1.69E-7 52 4l"' !l 

17 4.96E-7 53 : ^3 100 

18 1.39E-6 54 530 100 

19 3.74E-6 55 If I '° 

20 9.64E-6 56 645 II 
2^ 2.39E-5 57 J,. Iz 

22 5.71E-5 58 748 

23 1.31E-4 59 79^ !^ 

24 2.92E-4 H ^ 100 

25 6.30E-4 61 87 0 I7 

26 ^•31E-3 62 90 0 H '^0 

27 2.65E-3 3 20 H 

2« 5-21E-3 64 94 0 " 1^0 

29 9.92E-3 65 957 n 

30 .018 66 9A 3 

31 .033 67 97*« 100 

32 .058 68 98 4 100 
-100 9 3 9 \'t 100 

34 .166 70 99% 100 

35 .271 J, ll'l 106 100 

36 .430 7 9 '7 



108 100 



82 



there is virtually no ch,nce a student getting this score or higher iro. 
guessing. Indeed the chances o^ getting any score of 60 or higher through 
guessing are quite Ioh. The pattern oi probability distributions displayed in 
Table 6-6 indicates that nhile scores up to about 60 can be expected to 
reveal very little about inference ability because of the guessing factor, 
score:.s above this point are virtually unattainahlP through guessing. 

You will recall fron, the discussion of Pilot Study Five in Chapter Five 
that there were highly significant corrections between reading scores and 
thinking scores (see Table 5-4) and that there were no significant 
dif -rences in performance (see Tables 5-6, 5-7, 5-8, 5-9-) between the verbal 
report and the written response cohorts. These two pcints are worth 
mentioning here in terms of rounding out this discussion on guessing. The 
first point provides evidence that generally when students thought well they 
selected the best answer and that students who reasoned poorly selected an 
alternate answer. An examination of the verbel report pfotocr,ls showed that 
despite the opportunity of having a best-answar option, student, generally 
chose the answer that made most sense to the., the one that they could 
justify. It would seea then, that there was .uch more going on than guessing. 

The socond point that there were no significant differences in performance 
between the verbal report and written cohorts may point to a uniqueness in 
the nature of the task. Recall that Pilot Study Two showed minimal 
cifferences in the aaount of time taken by students to complete the multiple- 
choice and short-answer formats, suggesting that the reasoning demands of the 
task were similar. TIA is a test of inference ability, that is the ability to 
integrate relevant textual information and background knowledge and requires 
re-soning regardless of response format. In other words, determining the best 



.ns«er ,„ T,. re,„ire. .,.i„, an inference re„rdle=s the for.at o, the 
test. This ar,„.ent ,er, ei.iUr t. .„e found in tne area c, .athe.atics 
-here i. ha. .e.„ ar,„ed that for.at d03, „.t »atter i„ test per.er.ance 
(Traub it Fisher, 1977). 

Kuder-Richardsor. 20 Reliability Indices 
Table 6-7 gives .eans, variances, and KR-20 reliabilities for each story 
and for the .otal test. The Kuder-Richardson 20 reliability estimates are 
conservative, they give a lower bound estiaate reliability on a test. 
Nevertheless, it would be fair to say that TIA's reliability of .79 is highly 
satisfactory given the nu.ber of test ito.s (36) and the reported 
reliabilities of similar tests requiring students to reason well such a. the 
following: Test on Appraising Observations (50 ite^s) ..9; Cornell Crit-cal 
ThinHng Test, Level X (71 ite.s) .85; and the «atson-Slaser Critical 
Thinking Appraisal, Fora A (SO iteus) .80. 



83 



Table 6-7 

Heans^J^ariances^ KR-20 Reliahilities for ...ry ^nd Tnt.I Toc^ 



Aspect 



UFOs (items 1-12) 
Money (iteas 13-24) 
Newspapers(iteHg 25-36) 
Total Test(it9ffls 1-36) 



Mean 

24.65 
24.59 
24.23 
73.48 



Variance 

28.89 
24.95 
53.54 
183.67 



KR-20 

.60 

.49 
.77 
.79 



CHAPTER SEVEN 
SUKflARY OF PRESENT EFFORTS AND FUTURE PROSPECTS 
This report has aescribed the design and development of a test of 
inference ability in reading comprehension. It is a scaled-answer multiple- 
choice test intended for use with students in grades six, seven, and eight. 

The Phillips-Pattersor, Tesi of Inference Ability in Reading Co,prehension 
is based upon what is currently known about inference and is in accord with 
the best available principles and information as described in Chapter Three. 
The principle of inference appraisal and <-.he work reported hera are not meant 
'■n any sense to be definitive, but rather are meant to b.- a chart in what is 
an uncharted testing area. It is .0 be seen as an important starting point 
open to extension. The objectives, design, and evolution of the test reported 
in the preceding chapters represent a comprehensive methodology aimed at 
validly appraising inference ability in reading cooprehension. 

In order to have construct validity in tests of reading - .,^jp. .hsnsi on we 
must seek out the causes of performance on them (Phillips, 1986). Responses 
on measures of reading comprehension may be correct or incorrect for very 
different reasons. Correct responses are not sufficient evidence of 
comprehension because sometimes they are the result of minimal reasoning. 
Conversely, incorrect responses are not sufficient evidence of a lack of 
comprehension, because sometimes they are a result of coaprehension. 
Students' verbal reports and written explanations as to why they made their 
choices are ways to seek out causes of performance. 

The final version of the Phillips-Patierson Tesi of Inference Ability in 
Reading Comprehension, a copy of which is provided in Appendix , represents 
a reform over conventional tests of reading comprehension in at least tfirse 
ways. First, TIA is an aspect-specific test aimed at providing detailed 



ERIC 



information on young readers' inference ability in reading comprehension 
rather than being a general test of reading comprehension. Unfortunately, 
«.ost tests of reading comprehension yield at best superficial and 
nondescriptive information about reading ability. Part of the problem .ay be 
attributed to the complexity of the reading comprehension process. While a 
thorough definition of reading comprehension remains elusive, it seems that a 
Hor^',while approach is to study aspects of the reading process.. The TIA test 
is the first test kno.n to wirich focuse. specifically on inference in 
reading comprehension. This is not to say that considerable overlap does not 
exist among inference and the other reading processes in the actual process 
of reading comprehension. To st.dy the nature of such overlap would be an 
interesting project. 

The second unique feature of th. TIA test is its balanced concentration 
on the appraisal of inference ability across the three most representative 
discourse forms ,ound in the middle grades: narration, exposition, and 
description. This gives TIA a higher degree of content validity than most 
general tests of reading comprehension that do not f.fce the three discourse 
forma into account. 

The final feature and perhaps the most important for instructional 
purposes is that TIA .Hows for ranges of sophistication in inference 
ability. Inference-ma<jng is often a less than complete process, so a 
scaled-answer scoring system was developed io offer credit for partially 
co-rect responses rather than a conventional model where credit is given only 
tor the correct answer. 

Central to this work are future prcfspects. The completion of a manual 
which will allow for diagnostic information for instructional deci si on-manng 



. 86 

purposes is the next i .«edi ate - pro ject . Diagnostic information „uibe 
reported in a .anner that describes students' performance in ter.s of the 
quality of the inferences they have made, and the variations in ..ference 
ability across question types and across discourse for.i,s. Such , :ess- 
oriented information provides the necessary understanding of where students 
need instruction (Fr eder i ksen , 1984), and to that end specific teaching 
suggestions will be offered. The develop«ent of a short-answer form of the 
test which would allow for a more direct evaluation of the effect of 
background beliefs and lev.ls of sophistication on students' performance is 
also planned. 

other prospects include s study to identify the kinds of strategies 
students' use in attempting to understand the various di scourse f or<,s , to 
measure t'e effectiveness of t.Ncse strategies, and to explore the claim that 
reading .ell is thinking well by studying the relationships a«ong inference 
strategy use. good inference-making in text comprehension, overall reading 
comprehension, and critical thinking performance. 

A future prospect of a more collaborative nature is to study the seemingly 
multiple perspectives on the appraisal, of inference ability through an 
examination of the types of inference demands made by the TIA test compared 
with those on current but more general comprehension assess«ent projects 
(Valencia & Pearson, 1987; Wixson i Peters, 19875. 



Q 



REFERENCES 



ERIC 



Adajs, M., i Bruce, B. ( 1980). Packoround l^nn wledoe .nd Rp.Hjno Con.nrPhP n.fn, 
^Reading Education Report No 13). Cambridge, Ma.: Bolt.. Beranek, !< New'l?!^- 

Afl>iranM.R.i Jones B.F. (1982). Toward a nen definition of readability. 
Educational Psvchol ogi s*' , 17., 13-30. 'eiadoiiicy. 

Anderson R.C.(1972). How to construct achievement tests to assess 
comprehension. Review of Education;,] Rpcn.^.h ^ 42, 145-170. 

'^lo'r^Jhp"''^'' ^ S'he«ata as scaffolding 

for he representation of information in connected .iscourse. American 
Education al Research Journal , 1_5, 433-440. .jlHLiLdn. 

Andc.rson, R.C. i Pearson, P D. (1984). A Sche.a-theoreti c view of basic 
llZir^l ;;"ding co/,prehension. In P.D. Pearson (Ed.) Handbook of 
Reading Research (pp. 255-292). New York: Longman Inc. 

'"JaUo°n""nf"R"/'"'%H'' '^''^'^^^"^ '1985) Becoming a 
Nation of Readers. Champaign, IL.: Center for the Study of Reading. 

%'''os!orn""; R f = t er , B B . ( 1 984) . Content area texts. In R.C. Anderson, 
nl.^lrl \ r : .'-"'l L earning to R^ad in Ameriran c^.^oolsi Basal 

Associates' ^""224). Hillsdale, N.J.: Lawrence Erlbaum 

Applebee, A.N., Langer, J. A., i Mullis, I.V. (1987). The Nation's Report Card. 

'"'"Ill ; J''"?'"/ ' ^"'^ RP^.^nn,n, 

Princeton, N.J.: Educational Testing Service! 

'Trlnl\%l,lZ''' ^-^-^-"'"qY of MPaninn ny_ ^erbal Lsarninn . New York: 

%tki.'-U(7)'^''^'^' inferences in reading. 

^'rnip'n; '^'^2). From Conversation to composition: The 

role 0 instruction ir a developmental process. In R. SUser (e5.), Adv;nces 

:;ib:urA:::;"i::^^^"^"^^ ^ ' ,.,4,. Hinsdai., n.j.;l-i^ 

^^Rpadinn""! ^ n'^o' ' " ^ ^ i on s i n t he Soc i 0 1 i ngu i s t i c St udy of 
m!! ^" Handbook of R eading R.searrh (pp. 395-421 ). 
-lew York: Longman Inc. 

'Th 'cl'rr -(EdT'Th J'' /i"""''" structure and mental models. In 

Dev^lnnlpn ' . "!""V°P''""* ""ding Skillst New Dirpr t ions for Child 

Development (No. 27). San Francisco, Ca: Jossey-Bass. 

''frpsvch^lcnJ'''!' p'^'T' '''V^ ''''"''"^ '''' stylistics: Implications 
ssues n rI'h- ; ^"i*""' ^ ^"'^er (Eds.), Theoretical 

O eT^St^ so a U^'^ ComprPhPnsinn (pp. 221-239). Hillsdale, N.J.: Lawrence 



88 



CaHee, R.C. (1987). Tne schrol as a contpvf <.u 

The Be.din. T».h... 40(81? 73I-743. ".ess.ent o( l,ter,c,. 

Carroll, J.B. (19641. Language and Thnniht, EngU^ood Cliffs, N.J.: Prentlce- 

"^° V^La:!:;;^,nS-;rJrg^i-;' -n:'^.i^!;p--'""'^"v^e.on..raud. 

°°Sai?;:s fn^'sJ;;, ""(.rence. and Cultural 

Associati^! ^ Newark, Del: Internatiofial Reading 

Durkin, D. (1978-79). What classrooa observations reveal .hn„f . ^• 
comprehension instruction. Reading Re.p.r.h ..l!L°i:f, , uHsi-sS" ' 

Durkin, D. ( 1987) Testing in the kindergarten. Ihe_Readi,ni_IeMh£il, 40(8», 766- 

'""'^Pr:;uce-H"i: .nc /°^'^ ' Jersey; 

sin" Inc e tilosophy .( Edurat.on Besearch. »e» York, John U.ley !, 

'" Ph-io,^;"; °' '-^^ co.petenco, 

"?i:i',?;"s;oy:rc::;'"a^;. ;:,'f=2;j:r" «• 



ERIC 



89 



' w"'Hf;c;urt:i°f:. °"'"" r nl»^., »e„ 

Farr, R. !, Carey, R.F. (l!86]. ReadlnO! ghat r.„ k. m. 

Frederik.en, N. ,1,84). The real .est bias. ».erican R.vchnlnnl.> 31,3,, 1,3- 
""S^U^he^s."'""- THemnd^s Ne. SHen.e . „e, York, „.,., Basic B.c.s, ,„c. , 

'Cr^it^^-avhT^fL ;:fveP:::;te^r" ° °' "-^'^^ ^ • 

G.od.an, ». ,,,83,. Fact, Fiction. ,„d Fnrer..t. Ca.brid,e, «», niT Pre.s. 

""^^ CA.= »ads.orth 

°"r;„:;ro:ir;tion """"^ A-ica™ 

Har.an, G. ,1,84,. Change in Viei.. Ca.bridje, «s, niT Press. 

"s^Lar^^nt^i^nSi^i:; ,,, ;::;::;t.::. "-'"^ 

li;:fj:L"-^Sa~ 

Holland, J.H., Holyoak, K.J., Nisbett, R.E., S, Thaaard P R MPflAi r„H 

Processes of Tnfaran,-^ I ■ ■>.'-., « uidgarQ, r.H. (I98t>). Inductio n; 
processes of Inference, learnino. ^nri n,crn.,o.^ . Cambridge, MA.: HlTTT^s 

--^r^'Idert iS^SlTSi: ]^^T\^--^"^ " 

"°E"cit°;S;,'i:!;L,;;;";-:?/?!-r- p^'-' Pi^^-es. 

" -^^^^d;;^Ih^^;lI;L^;'-^^ ^jl!,,;-^ assess readers- pri.r 

""nli-fiL.""''- "'■""'''"^ ^-^'P°^y "^■^in, . York, N.V.: 



ERIC 



90 



Johnson-Laird N.F. (1983). Mental ModeU. Tow.rri. . r.onitivp ^ri ..ro nt 
Language, Inference, and Hnnsri n usness.. Cambridge, Mass., LvlrS UnlvTsity 

'tv::'al\i?e^tson!; lir- ThP,-r AppnT.Mon . Ne.Vor.. 

'tlT.: Le:;:rL.::i;;. ^?-if^^-..-;„ --^^-^ comprehension test bias. 
'°S85'"'744'-748!''''* " evaluation experts. The Raadino TP.rhpr . 

Kahne«an D. and Tversky, A. (1981). On the study of statistical intuitions, 
n D. Kahneman, P.Slovic and A. Tversky (Eds.), Judgement nnriPr 
Uncertainity; Heuristics and Ria^.f>s. Cambridge, MA; Cambridge University 

""f" ri'^^': ^'^Pg'"^''g"t^l Prn.-prin res for fhp Bahavioral ^ri.nr... 

Belaont, CA: Wadsworth Publishing Coapany, Inc. 

""'tT' "•"';o J"t'-oductjon to Prob.hilify Theory .nri <;f 
Mirence. (2nd edition). Nen York, N.Y.: John Wiley i Sons, Inc. 

Manzo, A. ( 1970). Readability: A postscript. Elementary EnaH.h, 47, 962-965. 

"'S?3;pn'; nrir^'' '^°-P:^''^" = i°" Monitoring. In Patrick Di rkson (Ed. ) , 
Children s Oral Cnmnium rati on SkilU. New York: Academic Press. 

Mason, J (1984) A Sche<na-Theoreti c View of the Reading Process as a Basis for 
Co.pre ension nstruction. In G.G. Duffy, L.R. Roehler & J. ason Eds 
Comprehension Instruction (pp. 26-38). New York: Longman Inc. 

McCloskey, M. (1983). Intuitive physics. Scientific A.,prir.n , 24, 122-130. 

McConaughy, S. (1980). Developmental differences in summarizing short stories 

D:;:;op"m:n"rB:s;^nrMa:s°:^^^ ^--^^ ^-^--^^^ - ^^^^ 

"or;t^;J;; an"n:?;sJ;2;; ^rkM^ L"2i$!2ii^-"- 

Monteith, M.K. (1976). Readability formulas. Journal of Rp.h^h^ , 19, 604-607. 

"''"InH r" '''c"' "-^y ^'-''itration o^ rules of inference. The Behavior.! 
and Brain SciPnrpg, 4(3), 349-350. 

''social J^dnenPot' ""^ 'I'''': Inference: StrateniP. .nH 

Social Judqenent . Englewood Cliffs, N.J.: Prentice-Hall, Inc. 



ERIC 



Unlvsrsity of Ne.f oundl and. Bessarch and Oevelop.ent, Be.orial 

""[^-Ul:''- c.pelence. Selene. Fd,..,Hnn 49, 

in press. critical .ninking theory. Teachers Cnl leoe Recnrri , 81(2), 

Pearson, P.O., Hansen, J, Se Gordon C motqi tk 1 1 . 
■ n(or.ation. Joornal ot „.,H,r.^ .i":;!, f;. "'"'"'^ 

introduction by ». Bo = an,.et.. cLridge, „l;va;d Jlif^J:!:? 
Phillips, I. «, 1,85). Categories of inference strategies. In M T Facan 

L^si^^Li^^^-vo'ff^i ' • .pr-:^",;;; "xr" '^r'""'"" - ' 

Council of Teachers oj English. = ■ "="""^1 

'^iilpJIienti^;. ^rklir^"^' 

Philosop^v'o'rLc 'tlorSocIety! 

''lll'lnl ';!!• ' PMUips-Patlerson test of .nferen-e 

1 L ? / I co.prehension, St. John's, Ne.foundUnJ: Inst Uule ^r 
Educational Research and Oevelop.ent, Memorial University of Ne»foundUnd. 



ERIC 



9 b 



92 



Reading, Thinking and Writing: Results irom the Mational h --sessfflent R».rt<nn 
and^Literature, Denver, Col.: Education Co..iss:on o^thTSnLd Lt^^^^ 

Richards, I. A. (1938) Interpretation. n t.....^^ Y,,,, ^^^^^^ 

Rubin, A. (1981) C onceptual Readability . NeH Wavs to I nnt ;,f Tovf lo 

'tr; aoj:;?:h-pu b?;:her::^ ^ 

Scriven, M. (1976). Reasoni jia.. New York: McGraH-Hill Book Company. 

Slakter M.J. (1967). Risk taking on objective examinations. American 
Educational Research Journal , 4, 31-43. «-iu<is. Hiaerican 

^"iiH^ton." ""'^^^^^^"dinq RP^^inq. Ne« York: Holt, Rinehart and 

Spilich, G.J., Vesonder, G.T., Chiesi, H.L., & Voss, J.F. (1979) Text 
processing of do«ai n-rel ated information for individual with gh and Ion 
dojain knowledge. Journal of Verbal Learninn .nH w..k., n!^ ^.Ml^^^ ^"^L"*^ 

Spiro, R.J. ( 1977). Reniesiber i ng information froa text: The "State of qrho™=» 
approach. In R.C. Anderson, R.J.Spiro, I. M. E. Honta u; I s. ch 1 d 
the Acquisition of Knowl^dgp . Hillsdale, N.J.: Lawrence Erlbau^ Associates 

^%rlr.ti' ? Individual Differences and Underlying Cognitive 

N':rY"k:-L::g:;;-i:r"" »^i..di^._^^h. 47;4onr 

Spiro, R.J., ic Taylor, B.«. (in press). On investigating children's transition 
hoU^ic r^e.I" r"r°'l ^ultidimensioJa? KlZl l, 

(Ls ? Lir.r T ^" ^^^'"^y^ J- Mitchell, i P. Anders 

''rc'-?h% ArericL'"p:v\°"^ .and Psychological TPsting . (1985). Washington, 
w.u..ine flraerican Psychological Association. 

"diHlr'ence;''ir'?h; T'f ^" interactive-compensatory model of individual 
Quarter?' 1 A .9-7?' development of reading fluency. Reading Research 

Stein N.L. (1983). On the goals, functions, and knowledge of readinqand 
"•-^ting. Contemporary F ducatinn.1 P.y rhm.^L., 261-292. ^ 

"ire' The fiihl^Ji^al '"'dT'^^'c"""'''""' ''''' ^hink you 

are. me Behavio. al ^nti Brain Scignrpg , 4(3)^ 339-340. 



93 



13, 165-172. «ratten text. Journal of Rsadino Behavinr ^ 

Taylor, B.M (1982). Text structcre and children's comprehension and .en,eorv 
for expository material. Journal of Fdnr. tional Psvrhnln^y , JJ^ Z2Z-ZA0 

TH^a^ar^^^^ Criteria .or theory choice. 

' nnlZ^i-. Z^:^ t^i^'^- - " — ^ - P=Vcholo.v and 

'-''-''^'^ ^^-^^V. Philosophy 

Thorndike, E.L. (1917). Reading as reasonings A study of mistakes in paraqraoh 
'^"•'^"g- Journal of E ducational Psvcholooy , 8, 323-332. Paragraph 

Vosniadou, S. 4 Ortony, fl. (19831. The e«sr,snce o( the 1 i t sral-.etaphor i cal- 
sno.sIous d.sl.nction in young children. Child Dev.lon.,nt JsJms?' 

"'apti;uS;"'on"'°" ■ i«P-'-ce ct do.aih kno.Iedge end overall 

llllrltlr KnHI-Jr ° = in.or.ation. Cognition anH 



ERIC 



94 



ERIC 



""'InYl ^''1': ' '''^'t'''^ Script-based inferences: Effects of te.<t 

and knowledge variables on recognition memory. Journal of Verbal rLrninn J 
Verbal Behavior , 23, 357-370. oc/urnai ot Verbal It^armnq and 

Uixson, K.K. Peters, C.W., Keber, E.B., 1, Roeber, E.D. (1987). Ne» direction 
in statewide reading assess.ent. The Reading T».rh»r , 40(8), 740-754 



Appendix A 
THE PHILLIPS-PATTERSON TEST OF 
INFERENCE ABILITY IN READING COMPREHENSION 



Memorial University of Newfoundland 
Reading Comprehension Tests 



PHILLIPS -PAtTERSON 




EST OF 
INFERENCE ABILITY 

ADING 











MULTIPLE- CHOICE FORMAT 



ERIC 



LINDA M. PHILLIPS - CYNTHIA C. PATTERSON 

INSTITUTE FOR EDUCATIONAL RESEARCH AND DEVELOPMENT 
Memorial University of Newfoundland 
©1987 

104 



DIRECTIONS 



Do Not Mark On This Booklet 

You will read three different stories. Questions follow each paragraph in the stories. Each question has 4 
answers provided. You are to choose the best answer and blacken its letter on the answer sheet. 

There are 36 questions and you should try to do all of them. To answer the questions you have to use infor- 
mation given in the stories and information you already know. The stories will not directly answer the questions. 
You will have to use your common sense and the story information. 

Here is an example: 



Example X: Chris had to write a story. Words like tennis, hockey, football, swimming, baseball, and skiing 
came to mind. So, Chris started to write a story on all of these. 
X. Chris wrote a story about 

(A) hockey. 

(B) nothing. 

(C) sports. 

(D) games. 



Remember: you must choose the best answer. You may think (hat more than one answer is good, but 
must choose the best answer from the 4 provided. 

Be sure to read and think carefully before you decide which answer is the best one. 



10b 



UFOs 



Thousands of people around the world believe that the> have seen unidentified flying objects. Anything 
in the sky that people do not understand ma> be called a UFO. People sometimes call UFOs **nying saucers,*' 
**spaceships from other planets/' and **extrateriestrial spacecraft." Sometimes weather satellites, clouds, and 
bright stars are thought to be UFOs. Stories have been told that UFOs light up an area with many coloured 
lights and that creatures of different sizes and colours have beeii seen in them. Another story is that UFOs drain 
power from any electrical sources in the immediate area. The weather, the time of da> , and the number of people 
may make the UFO stories different. 



1. UFOs are sometimes called other names because 

(A) people name them according to their shape or probable origin. 

(B) people know they are unidentified flying objects in the sky. 

(C) people see an area with many coloured lights in the sky. 

(D) people don't know what to call them, so they name them by shape. 



Something in the sky which is not understood may be called a UFO because 

(A) that is a term used when people do not know what it is. 

(B) that is an idea about which stories have been told. 

(C) that is what people call it when they jump to conclusions. 

(D) that is the shape of whatever it is in the sky overhead. 



3. It is not known where UFOs come from. It seems they 

(A) could be from Earth, because we have the materials and people to build such crafts. 

(B) couldn't be from Earth, because if they were people would know what they are. 

(C) could be from almost anywhere, because things in the sky might be misnamed. 

(D) couldn't be from almost anywhere, because the stoiy said they come from other planets. 



UFO stories may be very different from each other because 

(A) people tend to exaggerate what they see and think of different names for UFOs. 

(B) people think different things like weather satellites, clouds, and bright stars are UFOs. 

(C) people may not be sure of what they see when they see different things in the sky. 

(D) people may see many kinds of things in the sky at various times and places. 



Go on (0 ne.\( page 



2 



UFOs 

Scientists became interested in the different UFO stories and decided to investigate them. Scientists use three 
kinds of reports to study UFOs, Sight reports by people indicate that UFOs are seen during the day and night. 
Sometimes when people report seeing a UFO, radar stations report unusual events at the same time. Radar shows 
the direction and speed of things in space such as storms, aircraft, and meteors, as well as unusual events. Reports 
of physical proof such as ground prints, burned patches in fields, and melted patches in pavement are also studied 
by scientists. Weather conditions are checked when scientists study available information about UFOs, Many 
of the older reports are incomplete so we need to continue to study UFOs. 

5, Weather conditions affect UFO sightings in the sky because 

(A) it may be stormy so people may be unsure of what they see, 

(B) the story says conditions are checked by scientists in their work. 

(C) they interfere with how well people can view and identify what they see, 

(D) they cause damage to UFOs and interfere with how they work. 

6, The three kinds of reports used by scientists to study UFOs are 

(A) sight reports, radar reports, and reports of physical proof, 

(B) sight rep.^rts on storms, aircraft, and meteors, 

(C) sight reports on ground prints, burned patches, and melted patches, 

(D) sight reports, radar reports, and reports of objects in space, 

7, Using available information people learn the most about UFOs by 

(A) collecting radar reports and interviewing spectators, 

(B) checking radar stations which report unusual events. 

(C) studying many of the older reports about UFOs, 

(D) combining the in forma Jon contained in all reports. 



Go on to next page 



3 

IGV 



UFOs 



Scicniisis in the United Slates, Canada, and man> other countries have studied UFOs for man> >ears. In 
1969, one group of scientists concluded thai there was not enough evidence to prove thai UFOs are real and 
that UFOs were not worth further siud>. It seems thai many people mistake heavenly bodies such as meteors 
for UFOs. Bui some people arc still interested because even scientists agree that ten percent of UFO sightings 
are not explained. 

8. Many people mistake heavenly bodies to be UFOs because 

(A) both might be bright objects in the sky. 

(B) people wish they could see a UFO. 

(C) people think both look different in the sky. 

(D) scientists have studied UFOs for years. 



9. The percentage of UFO sightings that arc explained is 

(A) 10 percent 

(B) 90 percent 

(C) 100 percent 

(D) 80 percent 



10. Some people think the study of UFOs should be continued because 

(A) other people are still interested in UFOs. 

(B) the evidence about UFOs is not complete. 

(C) not all sightings of UFOs are understood. 

(D) some scientists think UFOs are not real. 



Go on io next paj»c - 



UFOs 



People are curious about UFOs. They want to know what has been seen. More studies might be done on 
UFOs using weather cameras in satellites. These cameras take pictures of cloud patterns and could photograph 
any high-flying objects. People are finding out more about space and the universe and many explanations are 
being given for UFOs. So, in the future UFOs may be known as IFOs. 

11. More evidence is available today about UFOs than years ago because 

(A) we have more books to get information. 

(B) there are more explanations being given. 

(C) there are more people observing space. 

(D) we have more scientific equipment. 



12. IFO means 

(A) identified flying ori^its. 

(B) identified flying objects. 

(C) imaginable flying objects. 

(D) unidentified flying objects. 



This is the end of the UFOs story. The next story you will read is called "Money'*. 



Go on to next page 



100 



ERIC 



Money 



Most people in our countr> use mone> almost ever>da>. You ma> know that mone> is used for bu>ing and 
selling without realizing how important it is. Mone> serves at least four important functions. It is a medium 
of exchange for goods, such as cand> or a movie, and for services, like dental care. It is a wa> of telling how 
valuable goods are. For example, we know that a car is more \aluable than a bic>cle because we spend more 
for the car. Mone> also serves as a unit of account, which means that pa>ments, loans, and transfers of money 
may be made from one person, company, or country to another. Money is also a store of wealth, which means 
that it may be saved for later use. 



13. Centimeters are used as a measu e of hei^iit, degrees are used as a measure of temperature, and 
money is used as a measure of 

(A) account. 

(B) function. 

(C) value. 

(D) cost. 

14. Money is a familiar part of our lives because 

(A) we use it to buy and to save. 

(B) we use it almost everyday. 

(C) we cannot exchange without it. 

(D) we earn it, spend it, and save it. 

15. If a chocolate bar and seventy-five cents are equal in value, then they would be an even 

(A) amount. 

(B) exchange. 



(C) 
(D) 



buy. 
unit. 



Go on to next page 




Money 



The topic of money has fascinated people for centuries. **Why don't they make enough money for everyone 
to have a lot?'' is a common question. There is no simple answer to that question, but money must have certain 
characteristics. It must be rare or obtained as a result of work. For example, if money were to grow on trees, 
there would be too much of it so it wouldn't have any value. When goods and services are plentiful they usually 
cost less. Money should be easily recognized by buyers and sellers. It must be divisible, easy to carry, and able 
to take lots of use. The kind of money used in the past was not like the cash, checks, and credit cards we use today. 



16, Umbrellas cost more when it is raining; oil costs more when it is scarce. It seems that the price 
of goods goes up when 

(A) people are fascinated by money, 

(B) people just have to buy them, 

(C) people are about to purchase. 

(D) people do not have to buy them. 



17. Diamonds are rare and valuable but are not as usable as money because 

(A) they do not have the same features as money. 

(B) money must be something that everyone knows. 

(C) they do not divide easily and may not be easy to carry. 

(D) money must be rare or obtained as a result of work. 



18. If there were not enough money for everyone to have some, then money would become 

(A) unpopular. 

(B) expensive. 

(C) valuable. 

(D) divisible. 



Go on to next page - 



7 



Money 



Thousands of >ears ago, people made trades among themselves to get whatever the> needed. For example, 
if you kept cows but needed more land, >ou would have to find someone who needed cows and had land they 
were willing to exchange. It might cost you ten cows for a piece of land. If you needed a cooking pot then it 
might cost you a cow, but it might not be worth a whole cow. So, small items such shells, whales' teeth, 
tobacco, coffee, and salt started to be used for trade. Hundreds of years ago, people started using metal such 
as coins for trading. More recently, paper money came into use when countries started trading farther from home. 



19. It was difficult to give someone change using trade items because 

(A) a pot might have cost a whole cow. 

(B) it was hard to know what change was. 

(C) a cow was not very easily divided. 

(D) they were usually exchanged whole. 



20. It became necessary to use pai.er money when countries started trading farther from home 
because 

(A) people could not trade conveniently with exchange items. 

(B) they had already started to trade items by using metal. 

(C) nobody could be sure of what type of items to trade. 

(D) it did not make sense to use anything else to trade items. 



21. The money system is different from the trade system because 

(A) money gives things a standard unit whereas trade does not 

(B) one animal could be worth more in trading than another. 

(C) it might cost you ten cows for a piece of land in trading. 

(D) money is newer and is what people use all the time. 



Go on to next page - 



Money 



Money forms are necessary and have changed over the years from cows, to whales' teeth, to paper, to plas- 
tic. A credit card may be used for goods and services because people agree to pay the bill at a later date. Credit 
cards are used today almost as widely as money to buy things such as meals, clothes, records, gas, or bus passes. 
No one knows what form of money people will use in the future. It seems more and more money will be handled 
by computers. Some kind of money system will always be a part of our lives. In the future, perhaps we will 
look at money the way we now look at trade items. You might be ubing a credit card to buy a chocolate bar 
and a computer to take care of your money. 



22. A credit card is sometimes called plastic money because 

(A) it can be used to pay for goods and services. 

(B) it is made of plastic and can be used like money. 

(C) it is plastic and not metal like money used to be. 

(D) it can be used when you do not have money with you. 



23. Complete the following to show how money systems have changed over the years: cow is to 
money, as money is to 

(A) trade items. 

(B) club cards. 

(C) credit cards. 

(D) chocolate bars. 

24. Some kind of money system is needed because 

(A) people n^ied to buy and sell things necessary for living. 

(B) no one knows what people will use in the future. 

(C) without money there would be no people to use it. 

(D) people need to buy goods and services for daily living. 



This is the end of the Mone> story. The next story >ou will read is called "The Wrong Newspapers". 



Go on to next page - 



9 




The Wrong Newspapers 

Ann pedalled her bicycle faster as she headed up the hill lo the last house on her paper roule. The driveway 
was long, so she had been lold ihai she could throw the paper from the road, **StrangeI'' she thought, **Yester- 
day's paper is still lying there,'' The newspaper, in a plastic bag, lay in a puddle left from yesterday's storm. 
She knew that Mr, Jones wasn't awa>, Ann was late for dinner so she rode on. She could feel the rain starting 
again as she turned to look at the White's dog barking at her. The last two weeks had been sunn> , until Monday , 



25, The newspaper was in a plastic bag because 

(A) it was the last paper so Ann left it in the bag, 

(B) it was raining yesterday and the paper would have gotten wet, 

(C) it was too stormy yesterday for Mr, Jones to get it, 

(D) it was lying in the driveway since yesterday. 



26, Yesterday's paper was still lying in the puddle because 

(A) it was in a plastic bag, 

(B) someone should have picked it up. 

(C) something odd has happened. 

(D) the weather was too wet. 



Go on to next page 



10 

I 

ERIC 



The Wrong Newspapers 



As Ann came in the front door, the phone was ringing. **Ann, this is Mr. Jones. Yesterday I received a 
newspaper that wasn't the right copy. Tonight it has happened again. What's going on?" 

**Mr. Jones, I just delivered your newspaper fifteen minutes ago. I saw yesterday's newspaper lying in a 
puddle at the end of your driveway. Haven't you been picking them up?" Ann asked. 

**Of course I have a newspaper! Tomorrow, I want tomorrow's paper delivered," Mr. Jones yelled, and 
slammed the phone. 

Ann scratched her head, puzzled. The rain drummed down on the roof and the thunder roared. The mystery 
would have to wait until tomorrow. 



27. What had happened to Mr. Jones's newspapers two days in a row? 

(A) Ann thought he had not picked them up. 

(B) The papers were coming a day late. 

(C) Ann saw yesterday's paper in a puddle. 

(D) He had not gotten the right day's paper. 



28. The mystery Ann has to solve is 

(A) to check why yesterday's paper is lying in a puddle. 

(B) to find out where the wrong papers are coming from. 

(C) to find out when and where the papers are disappearing. 

(D) to check why there is a problem with the papers. 



Go on to next page 



11 

lib 



The Wrong Newspapers 



The next day, Wednesday, Ann picked up her papers as usual and rode around her paper route. It was 
raining again. She skipped the White's house, as they had been away on vacation since Sunday, and headed 
up the Jones's driveway. She would hand deliver Mr. Jones's newspaper. Ann saw Monday's and Tuesday's 
papers still lying in the driveway. As Ann rang the doorbell, she could see Mr. Jones sitting in the living room. 
He waved at Ann to come in. Mr. Jones had a cast on his leg, and his crutches rested against a chair. Ann won- 
dered how Mr. Jones had been getting a newspaper since he lived alone. 



29. What was the first day Ann noticed the newspaper lying in the driveway? 

(A) Sunday 

(B) Monday 

(C) Tuesday 

(D) Wednesday 



30. Ann wanted to hand deliver Mr. Jones's newspaper 

(A) so she could ask him where he was getting his newspaper. 

(B) because she wondered how he had been getting a newspaper. 

(C) to make sure he got it and to talk to him about the mix-up. 

(D) to show him she had been delivering her papers every day. 



Go on to next page 



12 



iiu 



ERLC 



The Wrong Newspapers 



Ann asked Mr. Jones what had happened to his leg. Mr. Jones responded that he had fallen from a ladder 
on Monday. Mr. Jones was upset by the weather since he could not get outside for a couple of days. Just then, 
Skippy, the White's dog, came trotting in the living room. In his mouth was a newspaper, which he gave to 
Mr. Jones. Mr. Jones handed Skippy a cookie. Ann smiled and thought that part of the mystery was solved. 



31. The part of the mystery that was solved was 

(A) the way Mr. Jones got the newspapers. 

(B) the way Mr. Jones broke his leg. 

(C) that Skippy was handed a cookie. 

(D) that Skippy was part of the puzzle. 



32. The part of the mystery still to be solved was 

(A) where Skippy was going everyday. 

(B) where the right newspapers had gone. 

(C) where the wrong newspapers were coming from. 

(D) why Mr. Jones handed Skippy a cookie. 



33. Ann could find out where the wrong newspapers came from by 

(A) following the dog. 

(B) giving the dog a cookie. 

(C) asking Mr. Jones. 

(D) calling the paper company. 



Go on to next page - 



13 



11/ 



The Wrong Newspapers 



**May 1 borrow one of >our cookies, Mr. Jones?" Ann asked. Ann showed Skippy the cookie. **Get the 
paper, Skipp>!" Skipp> jumped through a hole m the fence and ran to his owners' house just next door. They 
watched Skipp> take a rolled-up paper from a stack of old papers on the White's porch and run back to the 
Jones's house. The right newspapers weie l>ing at the bottom of Mr. Jones's long driveway. Ana had solved 
the mystery of the wrong newspapers! 

34. The White's dog was with Mr. Jones because 

(A) the dog would get his newspaper for him. 

(B) he cared for SKippy while they were on holidays. 

(C) the dog lived in the house next door. 

(D) he needed the dog to help because of his broken leg. 



35. It was quicker for Skippy to get an old newspaper than the right one because 

(A) the old ones were on Skippy's porch so he knew just where to go. 

(B) the old ones were closer and the right ones were at the end of the driveway. 

(C) the right papers were in a bag and the old papers were rolled up. 

(D) the right newspapers were lying at the bottom of the long driveway. 

36. The mystery of the wrong newspapers happened because 

(A) the right newspapers were left in the driveway. 

(B) Mr. Jones had his leg broken and the newspapers could not be delivered. 

(C) Mr. Jones accepted the old newspapers from Skippy. 

(D) Skippy did not go to the right place to get the newspapers. 



This is the end of the test. 
Check your answers if you have time. 



14 




ERLC 



Ill 



Appendix B 

READING RATING SCALE FOR 
PHILLIPS-PATTERSON TEST OF INFERENCE ABILITY IN READING COMPREHENSION 



ERIC 



12.- 



II 



Reading Rating Scale for Phillips-Patterson Test 
oi Inference Ability in Reading Comprehension 

students"''"' " ''''''' °^ interpretation given by 

i. ^"Ttr^t ';:::;b^rt":; 'srioe^rornEs^H rii^di^t^ °! ?• 

TIA Reading Evaluatinn 

3 The student integrates relevant text information and 

bac ground knowledge to construct cmnfilete. interpretations 
a are^on^i^^ „ith both the—feT information and 
background knowledge. Thus, the student has given a 
complete inference answer. ^ 

2 The student integrates soise. text information and background 

knowledge but fails to take into account the av UabJe 
re evant information. The student's answer is cons tent 
with some relevant text information and bac round 
knoHle ge but is incomplete. Thus, the student has en a 
partially-correct inference answer. 

1 The student locates relevant text information but fails to 

integrate it with relevant background knowledge. Thus, the 
student has given a non-inference answer. 

0 The student makes an inconsistent use of the te^t 

3 points ( coiolete inference) 



<^ J. 



11 



recognizes the story says that scientists use three kinds oi reports to study 
UFOs. The most information about UFOs wouid be attaineu^ by r ling all three 
thus the preceding answer is consistent and complete Hith both teA 
information and background knowledgs. 

hJ.uJlT'V'V is available today about UFOs than yt^rs ago 

fhp f Ju" equipment to study UFOs." In this example, 

he s u en uses the text information about potential use of weather ca.eras 
n satellites and increased knowledge of the universe to compare i ,e present 
to the past. Tnis information i. then used to further reason that scientific 
equipment is an important factor in learning more about 'jFOs, thus the 

^:c^^^l^^^^^^^e;g^""^^ -^^^^^ ^^^^ - 

L-£ointj L (partially-correct infergn<;>) 

The student gives an answer that indicates an integration of sone text 
>nfor«a ion and background knowledge but fails to ?ake into account 
availab e, relevant information. The answer is consistent wi'Eh some te"t 

nlr^Jflnv relevant background knowledge, but is incomplete. Examples 
of partially-complete inference answers follow: 

(1) -UFOs are sometimes called other names because peoplt don't know what 

fr°o« o! ^''^P'*" '''''' example.'the student so 

from some relevant text information but did not provide alternate 
interpretations. The student did not appear to monitor fSr consist ncy and 
completeness with available text evidence, that is, UFOs are c d ot^^r 
P^\"any-\":rr::t:'^^^ P^eceding^^^s^ef ^^s^JJi; 

(2) -Something unidentified in the sky may be called a UFO because that 
»5 wha peopl. call it wh.n they jump to conclusion,.- In this ex aple, the 

n " ve"?he"r'e T '''' information but did n^E rLafn 

bu" i d^J n„ ^ ^"^ T"'' ' '""'^ ' statement may be true, 

Jext infnrLf "P'-"^"'= ^ "mplete interpretation of available, relevan 
text information pertaining to the question posed. 

F.r/h^ '^^ ""'^ "^^^ ""^ it seems they could be from 

e 1 T '° such'craft.- n " 

e ample the student reasoned from an unwarranted assumption tc a justifiab e 
a ernative interpretation. The student has construcEed an interpre a on 
but over ooks th, fact that we do not know what UFOs are, so how cou ^ 

are misnamed spacecraft fro- Earth which makes the answer parti al 1 y-^ect. 



1 ^ '• 



114 

1 Point (non-inference) 



It jay be that the student did not understand the task, or th/^^the 
student IS nore accustomed to non-inferential questions Hhic ofL r'gu re 
the nere location of information than to inference quest JSs „h ch r^qu 

tter clse 7^'^'"' '''' information and background kn h / " e 

latter case, students give an ansNer directly related to the text or an 
answer which reflects «,ini«al substantiation with the ext ev deJce 
examples of non-inference answers follow: eviaence. 

(1) "UFO stories are very different from eac.i other because DAnnjp 
so„ti-e. think thing, like .eather satellite,, clouds, I br g t u ° e 
UFOs stJr"y ' ''''' ^^"'^^"^ '^-^^'iv t"e 

(2) "It is not known where UFOs co/»e from, it see«s they couldn't be f-=« 
a"p e"?i;""\';"r ''"/'"'■^ '''' '''' other'plane s." n his 

:::;?!rc i::^ -nfti^tioT^^ ~ 

0 point (iaolausible answgr) 

It «ay be that the student did not understand the task, or that the 
student s answer is unsubstantiated. Examples of implausible answers Johowf 

(1) "UFOs are called other names because peoole know 1-hPv 
diiswer IS circular, it does not answer the itea, 

inapproori/i^P ..=n i\ f ] exaaple, the student makes 

are re and M^n ""'^ """"^'^ ^o prove that 5fOs 

I'plausibi: '''' further study.- The answer given is 

i. Ill ^h!"^''!^"^ unidentified in the sky nay be called a UFO because that 

th '^^ ^" ^''y-" ^" this exa«ple, the s udent wa 

:p^"Hy":a?"irin\Oe"":ky"is?"^"" ^^^^ ^^P^ - 

fn s|tories are very different fro« each other because people tend 

t..Hpn\"rf ' r''' °^ different naaes for UfSs " iJe 

a h Jh "? ^^^^^^"^^ information whi h ay " e 

r:::'d:::eJ:;:"tio;.^;^t;j: ran^we:/-^" ^^^^^^-^ 



7 r 

.1. z;^ <j 



Appendix C 
THINKING RATING SCALE 



116 



Thinking Rating Scale for Phillips-Patterson Test 
of Inference Ability in Reading Comprehension 

One of the variables derived to appraise the quality of TIA is a thinkina 
i= b"^'' "P°" analysis of verbal repo^T nterv eJ as 
students cited why they chose a particular answer as the'best answer on the 

liA contains 3 stories each with 12 items, students were asked to think- 
aloud on one of the three stories. For each item a student 's thinkinq wUl 
e ra ed be ween 0 and 3 for a total of 36 points if a stu ent „ 

"uiiing :L7e:''"- ^''^"^^"^ ''''' ""^^-9 to tie 

^^^^ TIA Thinkinn Evaluation 

3 The student cites aU. relevant textual and background 

information in the explanation of an answer choice. That 
IS, the student considers the question and the available 
textual and background information pertinent to it in the 
formulation of a response which is complete and consistent. 

2 The student cites some of the relevant textual and 

background information in the explanation. That is. the 
student considers either a part of the question and the 
available textual and background information pertinent to 
It, or the student considers the question and part of the 
available information pertinent to it in the formulaticn of 
a response which is consistent but incomplete. 

I The student cites insufficient, relevant textual and 

background information in the formulation of a response. 
That IS, the student's response is not sufficient to 
indicate a clear understanding of either the question or 
the story. It is in need of elaboration and contains 
information which is partially correct and partially 
erroneous. However, it does reflect minimal integration of 
relevant information. 

The student cites irrelevant or erroneous or repeats 
textual, background information, or both in the formulation 
of a response. That is, the student either misunderstands, 
misconstrues the story, or repeats the selected answer or 
textual information with no interpretation. 



>CvJ 



117 



of re.sonin, .bi ity, not e.pressi" lb i v tL ""J""^' V ' 

0°; ^^v:^.--^ .:po^r^'r..[:-"-b^^° oj b'j: 

,fn« ■'"^^^^^"tion they do so having made an answer choice, so the storv 

. = 1= 1. J cHui t incerview. m other words, when students tell whv thpv 
selected a particular answer to be the "hp.;f' fhn . ^ 2 7 ^ ^ 
response .ust he used in rating .^aU t/^f stuS't ^^l'' ' ^^"""^ 

^-er!rV;t^":jj°::rh" ^;::s;r:ntj:i::: -^^^ 

stories on m. Test ite« ste«s are bolded and student "be t" .n%L h 

coaplete the test ite«. Student interview llT.TnL f" 

followed by an evaluutor's co.«ents. then presented 



Item 1 on the UFO. story is evaluated as follows; 

3 points 



2 pointf 

UFOs arf! 



-haf'Jo'"' ^f!; """" because people don't know 

..nSH''M!" °" °' I" 'o^srred about »hy UFOs .ioht be 

tl? i ;^'r;oS.'"V^^^ 

^JL'^iJt; .'n"t'o°7:i;°„",„r:orj:rb::rant:;/""'=- 

1 point 

UFOs are so.etiaes called other na.e5 because people see an area with «any 

ERIC 126 



118 

coloured lights in the sky. 

In this instance, the student failed to make the intended Jfe-^ncP in 

0 Point 

Item 9 on the Money story is evaluated as follows: 

3 points 

I's'tanda7d'Llt" 1" '''T""' '''"^ «°"ey gave things 

a standard un:t whereas trade did not. ^.ninys 

b.c.,rouni i„,„r.alio„, «hij the above stuJm h.s' J""' 

2 Points 

-^:^.rTL"t^a^?ig":^rt:-:t::r^^-^^ - — 

stu"l"^n"^';r''''";f:ht°"h "r^""!!' '"^^"^ giving ™i,k and 

for a Piece 0 UnS ? ^ ° 5 '"'^ =° ^^'"^ wouldn't be a fair trade 

tor a piece of land. The cows and the land wouldn't be worth the sane " 

te« f :n7b°;;L '''' °* ThTst dent used 

textual and background information to cite the ineouities n* fho fr.Hn I 
but did not speak to the money system. inequities of the trade system 



ERIC 1^ 



119 



1 Point 



The aoney systea is different irom the trade svstB* h^r:.n.o a- ■ 
ten coHS for a piece of land in trading ^ ''^ "'^^^ 

».tb.ul an, i„di„li.„ 0, having reasoned through Ihe ^uluin! 

0 Point 

The aoney systea is different froa the trade svstea h.c^in., »,n 

people use all the tine so it is newer ^ " """^^ 

Student says, "everybody knows what «oney is and ho« to use it." 

-^^^-"■e'^e::n^'^"^:; j„i :h'--h'ier:r:„\trX i?:.— 

^ °" The Wrong Newspapprc; story is evaluated as follows: 

3 Points 

llllZlZ' "" I""'-" s««tHin, Odd has 

and'.T^is'pecilUr'thaTh '1' 1""^ 

the r!in":cL:e"'in" I! 'a e'j "d ifi: h^^r'L'n^r'r" " "'il'"'' 
happened «„,.a, . he uue is a o,.-.h".^:-;^ 

— ."";»":./:.itper:uj"7?;n: r.:r;::i,::- 

2 Points 

yesterday, p.p.r ,a. still lying in the puddl. h.ca... the .eather »as too 

.0 ^n: I'j'n .;,";ri!^!: - - c-.^ o„t 

(ori:,!uV!:sii;i!aJi=r'""! infor.ation to 

- ■""^-^^-°--^th^"^Lf^„s^^^:^L^v°„"or-^ 

1 point 

Yesterday, paper .till lying i„ the puddle because it was in a plastic 
Student says, "It says in the story that's how Ann left the paper " 

--""-"""^"-"'^^^t^^tnr 

E± 128 



textual information. 



120 



0 point 

'"clY/V " ^^-'^ lave been 



ERIC 



Appendix D 
DIRECTIONS TO TEACHERS 



Directions for Administering the 
of Inference Ability in Reading Coaprehensi 



The teacher should be familiar with the test directions orior fcn 
ad«.n:ster:ng the test because son,e discussion wit tu 
necessary. Teacher co«.ents to be read aloud to students appea'-Jbold 

An approximate administration time including directions is 40-50 minutes. 

forV'ef'er'e'nce"' ^° ^"P ^ "P^ 

2. Make sure each student has a pencil ,nd eraser. 

V\ ^I* Nhich you are going to do NiU help .ducators learn 

about students' reading ability. Students in grade. 6, 7, and 8 !u 
across Canada are cooperating in the study. You'ar. a ve;y iipor ant pa 
of future i«prove..nt, in the teaching of reading. This test does not 

in - t^^- readin " uJs " 

can on thU test " so.ethinq. Try to do the best that you 

4. Direct students to their answer sheet and have the. complete the 
information section on the answer sheet. 

Say. Look at your answer sheet. Print the na.e of your school, your 

provided at the top. You do not put your naae on anything. 

Allow ti«e for the students to complete the information requested. 

^' them'nnJ'''/''"'^';'^' ^'""^ P^^e of the test booklet. Advise 

them not to «ark on the test booklet. Point to the "Directions' Ask if 
everyone is looking in the appropriate place. Scan the cUs; to Ue 

Say: Follox along as I read aloud the 'Directions'. 

Say: Think about which answer you think is the best one. 

Allow time for the students to read and to think about their answer. 

Cir'cll"[i;e'°il?r "I"": ' °" ^'^--^^ =heet. 

bes" one corresponds 'co the answer that you think is the 



r ^ ' 



123 



6. When all the students have marked the answer whirh fhow l 
direct the students' attention back to the Jront . ^ m 

work through the 4 answers provided °* ^''^ '^"'^ ^"'^ 

Note: It is important that students understand -hv thev ^hnuin mn -a 
all possible answers before deciding which answer t 'th Z lZ 

one and Khy they have to use information given in the sSrvanS 
n or«at:on they already know in deciding upon Jhe best answer Thf 
following discussion guide should help to «ake these points 

Says What about aniw.r A, hockey? I, that a good antw.r? Why? 

It is important for students to understand that even thouoh "hnrLnw ■ 
mentioned, so are "tennis, football. swi«.ing. basebaJi; and' ski^nj^!' " 

Sow of ui tight likt to write about -hockey" becauie we likp if 

:"d"',i g;:t*Mt"2hr"jr'^"S'd ^"^r^*^ inforiat-^rr :vii'r:,iz 

Jj'sIvs'^-ward^H^''"; * 90 back to paragraph X 

What about .wi«.ing? I, it u"llW cll !rf I V * 9"^. 

Baseball ii u-uaiiv .In I called a "gaee" or a "sport"? 

called "5no?t.- si^ . * 3*"- Skiing and 5Wi«ing are usually 

sport. . So, "gages" would not include "s«j..ing" and "skiing". 

paragraph so it U f^! ?! *^ activities eentioned in the 

EhinLg ibout: '"'''^ '''' '''''''' Chris was 

ha'vHcorid rp*oint"'An'.UJ'2'i:"::"^^^;H^' '''''' '"••'•^ « V- -"^'^ 
given you 2 oniit. Til n P°'"^»- ^ have 

3 p:intTif'ySu°M';e';h":: oT" ' Vou would get 

H you have earked a different answer, you .ay eras, it if you «ish. It 



ERIC 



1 "-^ "• 



124 



doesn't really .atter because Exa.ple X is just for practice. 
Does anybody have any questions? 

Are there any questions about the example mc have done? 
Are there any questions about hoM the answers are scored? 

7. If there are no questions or comments, 

Itll fhi""^ 3 stories in the test booklet. The first story is called 
UFOsj the second, Hone^ and the third story, The Wrong Ne.«sn.n^r. 

Read each paragraph and the questions nhich go „ith it. Decide nhich 
answer you think is best and circle that letter on your answer sheet. 

8. Direct students' attention to the answer sheet. Point out that there are 
three columns on the answer sheet, one for each of the stories. Tee re 

shee't"" JcJe; th''"'h "''tT.' '° '''' '''' """""'^ °" — 

bJoUeJ ^^^"^ ^""^ Hith in the test 

Are there any questions about how and where to .ark the answer sheet? 
Are there any other questions? 

9. If there are no further questions. 

Say: You have about 3c .inutes to coaplete the test. Try tc couplets all 
of the questions. You .ay open your test booklets and start now 

ar^r'no^" ^''^ classroom to ensure that students 

are not experiencing procedural difficulties. If students ask fr 
assistance with vocabulary or any of the test content, do n t fer he p 
Students are to do as well as they can on their own. Q^t. otter help. 






Appendix E 
KEY TO TIA ANSWER SCALE 



Er|c 13<x 



126 



UFOs 

1. (A) =3 

(B) = 0 

(C) = I 

(D) = 2 

2. (A) = 3 
(B) 

(C) •- 2 

(D) = 0 

3. (A) = 2 

(B) = 0 

(C) = 3 

(D) = 1 

4. (A) = 0 

(B) = I 

(C) = 2 

(D) = 3 

5. (A) = 2 

(B) = 1 

(C) = 3 

(D) = 0 

6. (A) = 3 

(B) = 0 

(C) = I 

(D) = 2 

7. (A) = 2 

(B) = 1 

(C) = 0 

(D) = 3 

8. (A) = 3 

(B) = 2 

(C) = 0 

(D) = 1 

9. (A) = 1 

(B) = 3 

(C) = 0 

(D) = 2 

10. (A) = I 

(B) = 2 

(C) = 3 

(D) = 0 

11. (A) = 0 

(B) = I 

(C) = 2 

(D) = 3 

12. (A) = 0 

(B) = 3 

(C) = 2 

(D) = I 



Key to TIA Answer Scale 

HONEY 

13. (A) = 1 

(B) = 0 

(C) = 3 

(D) = 2 

14. (A) = 2 

(B) = 1 

(C) = 0 

(D) = 3 

15. (A) = 0 

(B) = 3 

(C) = 2 

(D) = 1 

16. (A) = 1 

(B) = 3 

(C) = 2 

(D) = 0 

17. (A) = 3 

(B) = 0 

(C) = 2 

(D) = 1 

18. (A) = 0 

(B) = 2 

(C) = 3 

(D) = 1 

19. (A) = 1 

(B) = 0 

(C) = 2 . 

(D) = 3 

20. (A) = 3 

(B) = 1 

(C) = 2 

(D) = 0 

21. (A) = 3 

(B) = 2 

(C) = 1 

(D) = 0 

22. (A) = 1 

(B) = 3 

(C) = 0 

(D) = 2 

23. (A) = 0 

(B) = 2 

(C) = 3 

(D) = 1 

24. (A) = 3 

(B) = 1 

(C) = 0 

(D) = 2 



ERIC 



NEWSPAPERS 

25. (A) = 2 

(B) = 3 

(C) = 0 

(D) = i 

26. (A) = 1 

(B) = 0 

(C) = 3 

(D) = 2 

27. (A) = 2 

(B) = 0 

(C) = 1 

(D) = 3 

28. (A) = 1 

(B) = 3 

(C) = 0 

(D) = 2 

29. (A) = 0 

(B) = 2 

(C) = 3 

(D) = 1 

30. (A) = 2 

(B) = 1 

(C) = 3 

(D) = 0 

31. (A) = 3 

(B) = 0 

(C) = 1 

(D) = 2 

32. (A) = 2 

(B) = 0 

(C) = 3 

(D) = 1 

33. (A) = 3 

(B) = 2 

(C) = 0 

(D) = 1 

34. (A) = 2 

(B) = 3 

(C) = 1 

(D) = 0 

35. (A) = 2 

(B) = 3 

(C) = 0 

(D) = 1 

36. (A) = 1 

(B) = 0 

(C) = 2 

(D) = 3 



